Open casscw opened 6 years ago
@eric-haibin-lin
@ZiyueHuang
Lazy initialization for each row in the row_sparse weight parameter is an optimization we haven't done yet. You're right - ideally the weight can be initialized only when a certain category is seen. Currently all rows are filled at one shot during initialization. What sparse embedding provides is the ability to only retrieve the parameters for the categories in the current mini-batch (instead of loading the full model). For example, you might have a full model on CPU. For each mini-batch, only load the rows for the seen categories on GPU and perform forward backward.
Lazy initialization is definitely worth investigations though. What's your use case / application?
Thanks for your explanation. Just wide&deep for recommendation system
in my application, categorical feature's input dim may be ten millions, but categorical feature actually has two millions. If lazy initialization is adopted, it will save much memory.
Will there be any development plan on this issue? Lazy initialization for large row sparse ndarray is crucial for use cases like sparse embedding on billions of features (with a minibatch only seeing a few thousand). @eric-haibin-lin
I'm not aware if anyone has spare time to work on this. If you'd like to contribute, I'm happy to discuss design and implementation
mark to come back soon
@mxnet-label-bot add[Distributed]
Description
In the model wide&deep,a categorical feature 'sex' has three values '0,1,2'. Then feed it into SparseEmbedding layer, just like the code below.
Environment info (Required)
Linux, mxnet-1.0.1,GPU,python-2.7
Detail
By checking the model's params, the 'single_2_embed_weight' has values in each row which is not sparse
why the categorical input only has three values【or very few values】, it's embedding weight is dense【has values in each row which is not sparse】instead of 【few rows having values】?
Demo
https://gist.github.com/casscw/2e7a436704ead8804261f8b13e84f1a1