Closed piiswrong closed 7 years ago
(Just for book-keeping) Should be the continuing of this issue https://github.com/dmlc/mxnet/issues/1524
How can you predict the memory size when writing a sparse array? Maybe you need additional temp buffer.
Between Row and COO, how the interface exposed to upper layer? Will it expose to end-user?
Operator can reallocate memory inside forward if needed.
user can call .is_sparse to see if it's sparse. but the internal buffers are not exposed
A good job. Tensorflow implementation is similar. Maybe this should be co-designed with sparse weight push and pull in kvstore. #1237
What's the pros and cons defining two types NDArray
and SparseNDArray
vs defining one with a flag for type? Most frameworks seem to define them separately.
What's the precedence of sparse type when it's less clear?
Could we add a function diag
similar to scipy.sparse.csr_matrix.diagonal
and scipy.sparse.diags
for extracting the diagonal of a matrix and making a diagonal matrix? They are often used. (It'd be even better if we largely model after the scipy.sparse
interfaces.)
Two general comments for the sparse ndarray, though there is a large number of engineer details we need to solve.
As mentioned by @jli05, a single ndarray supporting both sparse and dense makes the interface clean, but it may confuse users. I'm prefer to give both (dense) NDArray and SparseNDArray to users so that they need to think about when should use sparse and when dense.
Between row sparse and COO, I suggest to have CSR. It is good for input 2D sparse data and weights for LDA. It is more compact than COO, and may lead to better performance.
There is straightforward way to extend to kvstore
: uses the row ID as the key, and then partition the rows into server nodes. It is less flexible than use the index tuple as key, because need to communicate a whole row rather than individual element. But communicating rows may be better for performance and is good enough for most algorithms. For implementation, we only need to update the kvstore implementation without changing ps-lite.
@mli For very sparse libsvm data and other nlp data, CSR is appropriate. But for the weight or latent vector to be learned, IndexedSlice may be more appropriate? In this manner, row id is just feature id or word id.
@piiswrong refers "IndexedSlice" as "Row sparse"
will start to work on it after updated the documents https://github.com/dmlc/mxnet/pull/5151
On Sun, Feb 26, 2017 at 1:34 AM, formath notifications@github.com wrote:
Any progresses?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4742#issuecomment-282544087, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4Ro38FtBmU6l8sn3K77LFpj3s5U9ks5rgUdCgaJpZM4Lo3xX .
We are in process of choosing a neural net framework that's close to JVM for deployment ease www.github.com/Verizon/trapezium and mxnet integration with JVM is more comprehensive than tensorflow (distributed_runtime is not exposed through JNI yet and it will be involved). While reading tensorflow paper https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf it was encouraging to see mxnet and tensorflow performance are at par but the following statement confused me: The MXNet key-value store interface [22] does not currently allow sparse gradient updates within a single value, which are crucial for the distributed training of large models (§4.2), and adding this feature would require modifications to the core system. @mli this particular issue is closest to the sparse gradient update I could find. Is this statement true that core system need to be modified ?
@debasish83 Sparse updates is only relevent if you want to train with sparse matrices, i.e. for recommendation systems.
BTW, MXNet distributed training performance is much better than the opensource version of tensorflow by at least 2x tensorflow 1.0 claims an improvement but we haven't seen any public benchmarks.
recommendation system migration to nonlinearity is one of our focus as well. We did spark based flows for generalized matrix factorization for recommendation and topic modeling: http://debasish83.github.io/spark-meetup-july2015/slides.pdf https://spark-summit.org/wp-content/uploads/2014/07/Quadratic-Programming-Solver-for-Non-negative-Matrix-Factorization-with-Spark-Debasish-Das.pdf
We are working on sparse support. Probably will take a month or two.
Any issue that I can follow and possibly help in adding the sparse support ? From matrix factorization in particular, it is possible to generate gradient per user_i and item_j to save it to parameter server as a big vector of user_i x item_j...is the concern there is that the network communication will be too high and that's why we want to block it ?
I'll write a proposal for sparse tensor this week.
Any update on sparse tensor ? Is there a apache mailing list / JIRA now where discussions can be held ?
Most of the discussions are on github right now. We are just starting to use the apache mailing list.. Two other threads for sparse are #5498 #5707
How to use the Apache mailing lists? Could you give a web page that explains in detail?
Any progresses?
@jli05 apache mailing list is here: http://mxnet.io/community/mxnet_channels.html @formath we're merging cpu implementation into the sparse branch this week. Still need to refactor some nnvm code and do some benchmarking before merging it into master #5800 some initial benchmark result is available at https://github.com/eric-haibin-lin/mxnet/issues/60
If MKL uses this design, do you expect any changes in front-end scripts or do you try to keep the front-end script same?
@ykim362 I'd expect most of the changes are in the backend.
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!
Frontend Data Structures
NDArrayBase -> NDArray -> SparseNDArray
Backend
Data Structures
Add to original NDArray to avoid v-table and API change. A sparse tensor is represented by two dense tensors, index and data. There are two possible sparse formats:
Row sparse tensors where each row of data -- data[i] -- correspond to X[index[i]] -- shape: (N, ...) -- index shape: (M,) -- data shape: (M, ...)
COO sparse tensor where X[tuple(index[i])] = data[i] -- shape: (d_0, ..., d_K) -- index shape: (N, K) -- data shape: (N,)
API
FComputeNDArray = std::function<void(Context ctx, ..., vector<NDArray> inputs, ...)