apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[DISCUSSION] Sparse Tensor Support Design #4742

Closed piiswrong closed 7 years ago

piiswrong commented 7 years ago

Frontend Data Structures

NDArrayBase -> NDArray -> SparseNDArray

Backend

Data Structures

Add to original NDArray to avoid v-table and API change. A sparse tensor is represented by two dense tensors, index and data. There are two possible sparse formats:

  1. Row sparse tensors where each row of data -- data[i] -- correspond to X[index[i]] -- shape: (N, ...) -- index shape: (M,) -- data shape: (M, ...)

  2. COO sparse tensor where X[tuple(index[i])] = data[i] -- shape: (d_0, ..., d_K) -- index shape: (N, K) -- data shape: (N,)

enum SparseType{
  kDense,
  kRowSparse,
  kCOOSparse,
};

NDArray {
// new fields

SparseType sparse_type();
TBlob data() { CHECK(sparse_type_ == kDense); ... }  // Error when called on sparse matrix
TBlob sparse_data() { this->WaitToRead(); ... }  // blocks because length_ can change
Tblob sparse_index() { this->WaitToRead(); ...} 
void SparseResize() { data_->Reallocate(); ... }  // Change size of actual data. Reallocate when upsizing. Need to have write lock before calling.
NDArray to_dense();

private:
SparseType sparse_type_;
shared_ptr<Chunk> index_;
};

API

  1. Sparse op register FComputeNDArray = std::function<void(Context ctx, ..., vector<NDArray> inputs, ...)
  2. MXImperativeInvoke -- If at least one of input is sparse, try to use FComputeSparse. If not registered, call to_dense on All inputs
  3. Executor -- Memory is never shared for sparse buffers. Operators can allocate/reallocate memory for its output buffer.
sxjscience commented 7 years ago

(Just for book-keeping) Should be the continuing of this issue https://github.com/dmlc/mxnet/issues/1524

howard0su commented 7 years ago

How can you predict the memory size when writing a sparse array? Maybe you need additional temp buffer.

Between Row and COO, how the interface exposed to upper layer? Will it expose to end-user?

piiswrong commented 7 years ago

Operator can reallocate memory inside forward if needed.

user can call .is_sparse to see if it's sparse. but the internal buffers are not exposed

formath commented 7 years ago

A good job. Tensorflow implementation is similar. Maybe this should be co-designed with sparse weight push and pull in kvstore. #1237

jli05 commented 7 years ago
  1. What's the pros and cons defining two types NDArray and SparseNDArray vs defining one with a flag for type? Most frameworks seem to define them separately.

  2. What's the precedence of sparse type when it's less clear?

  3. Could we add a function diag similar to scipy.sparse.csr_matrix.diagonal and scipy.sparse.diags for extracting the diagonal of a matrix and making a diagonal matrix? They are often used. (It'd be even better if we largely model after the scipy.sparse interfaces.)

mli commented 7 years ago

Two general comments for the sparse ndarray, though there is a large number of engineer details we need to solve.

  1. As mentioned by @jli05, a single ndarray supporting both sparse and dense makes the interface clean, but it may confuse users. I'm prefer to give both (dense) NDArray and SparseNDArray to users so that they need to think about when should use sparse and when dense.

  2. Between row sparse and COO, I suggest to have CSR. It is good for input 2D sparse data and weights for LDA. It is more compact than COO, and may lead to better performance.

There is straightforward way to extend to kvstore: uses the row ID as the key, and then partition the rows into server nodes. It is less flexible than use the index tuple as key, because need to communicate a whole row rather than individual element. But communicating rows may be better for performance and is good enough for most algorithms. For implementation, we only need to update the kvstore implementation without changing ps-lite.

formath commented 7 years ago

@mli For very sparse libsvm data and other nlp data, CSR is appropriate. But for the weight or latent vector to be learned, IndexedSlice may be more appropriate? In this manner, row id is just feature id or word id.

mli commented 7 years ago

@piiswrong refers "IndexedSlice" as "Row sparse"

mli commented 7 years ago

will start to work on it after updated the documents https://github.com/dmlc/mxnet/pull/5151

On Sun, Feb 26, 2017 at 1:34 AM, formath notifications@github.com wrote:

Any progresses?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4742#issuecomment-282544087, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4Ro38FtBmU6l8sn3K77LFpj3s5U9ks5rgUdCgaJpZM4Lo3xX .

debasish83 commented 7 years ago

We are in process of choosing a neural net framework that's close to JVM for deployment ease www.github.com/Verizon/trapezium and mxnet integration with JVM is more comprehensive than tensorflow (distributed_runtime is not exposed through JNI yet and it will be involved). While reading tensorflow paper https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf it was encouraging to see mxnet and tensorflow performance are at par but the following statement confused me: The MXNet key-value store interface [22] does not currently allow sparse gradient updates within a single value, which are crucial for the distributed training of large models (§4.2), and adding this feature would require modifications to the core system. @mli this particular issue is closest to the sparse gradient update I could find. Is this statement true that core system need to be modified ?

piiswrong commented 7 years ago

@debasish83 Sparse updates is only relevent if you want to train with sparse matrices, i.e. for recommendation systems.

BTW, MXNet distributed training performance is much better than the opensource version of tensorflow by at least 2x tensorflow 1.0 claims an improvement but we haven't seen any public benchmarks.

debasish83 commented 7 years ago

recommendation system migration to nonlinearity is one of our focus as well. We did spark based flows for generalized matrix factorization for recommendation and topic modeling: http://debasish83.github.io/spark-meetup-july2015/slides.pdf https://spark-summit.org/wp-content/uploads/2014/07/Quadratic-Programming-Solver-for-Non-negative-Matrix-Factorization-with-Spark-Debasish-Das.pdf

piiswrong commented 7 years ago

We are working on sparse support. Probably will take a month or two.

debasish83 commented 7 years ago

Any issue that I can follow and possibly help in adding the sparse support ? From matrix factorization in particular, it is possible to generate gradient per user_i and item_j to save it to parameter server as a big vector of user_i x item_j...is the concern there is that the network communication will be too high and that's why we want to block it ?

eric-haibin-lin commented 7 years ago

I'll write a proposal for sparse tensor this week.

debasish83 commented 7 years ago

Any update on sparse tensor ? Is there a apache mailing list / JIRA now where discussions can be held ?

eric-haibin-lin commented 7 years ago

Most of the discussions are on github right now. We are just starting to use the apache mailing list.. Two other threads for sparse are #5498 #5707

jli05 commented 7 years ago

How to use the Apache mailing lists? Could you give a web page that explains in detail?

formath commented 7 years ago

Any progresses?

eric-haibin-lin commented 7 years ago

@jli05 apache mailing list is here: http://mxnet.io/community/mxnet_channels.html @formath we're merging cpu implementation into the sparse branch this week. Still need to refactor some nnvm code and do some benchmarking before merging it into master #5800 some initial benchmark result is available at https://github.com/eric-haibin-lin/mxnet/issues/60

ykim362 commented 7 years ago

If MKL uses this design, do you expect any changes in front-end scripts or do you try to keep the front-end script same?

eric-haibin-lin commented 7 years ago

@ykim362 I'd expect most of the changes are in the backend.

yajiedesign commented 7 years ago

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!