Open coldsheephot opened 6 years ago
and there has another problem:
when I delete kv.push(i,a)
in test_row_sparse_pull() function,it needs more time than use push,it is very strange!!
row_sparse push and pull time: 8.14685201645
def test_row_sparse_pull(): out = a for i in range(100): # kv.push(i, a) kv.row_sparse_pull(i, out=out, priority=i, row_ids=all_row_ids) # out.wait_to_read() # mx.base._LIB.MXNDArrayWaitToWrite(a.handle) mx.nd.waitall()
row_sparse_pull will be fast only when the number of row ids is very small e.g. kv.row_sparse_pull(i, out=out, priority=i, row_ids=mx.nd.array([10]))
@eric-haibin-lin when I push the sparse gradient, after merge,update ,I can get the weight; but I can not get the which row has values and which row doesn't has values. how can I know the row_ids in single machine,I can know the indices from the sparse gradient,but in distributed training,different worker has different indices,when I pull from the ps , I also can not know the final row_ids
Thanks for submitting this issue @coldsheephot @sandeep-krishnamurthy could you add labels "Performance", "Sparse" to this?
For minibatch training usually you derive rowids from the sparse data in the minibatch. For checkpointing, you need to pull all rowids. RowSparseNDArray doesn't have a prune method yet.. Maybe worth adding
In distributed training, may I get the real indices which have values? can you add the method @eric-haibin-lin @kalyc @sandeep-krishnamurthy
@coldsheephot are you working on multi-device or multi-machine case? I have plan to extend it for multi-device mode
@eric-haibin-lin yes. I want to use the feature for multi-device and multi-machine case. Thanks.
How long does it take to solve those problems???I am very anxious
@mxnet-label-bot add[Distributed]
result:
when I use row_sparse api to develop deep gradient_compression algorithm ,i find the api is too slowly than dense gradient, can you help me to save more time ?
code:
In this code ,I even don't use merge and update, and also tostype even use once, but it also too slowly than dense gradient