apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.79k stars 6.79k forks source link

row_sparse numpy Parameter and row_sparse gradient in npx.embedding? #20391

Open fhieber opened 3 years ago

fhieber commented 3 years ago

Description

While migrating to the numpy namespaces in MXnet 2.0 I observed an error when trying to create a row_sparse parameter (see example below). The example shows our current pattern in mxnet 1.x (using NDArrays/symbols).

Does the new numpy interface not yet support row_sparse parameters/gradients?

Error Message

[12:07:12] ../src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
Traceback (most recent call last):
  File "sparse.py", line 14, in <module>
    b.initialize()
  File "/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 574, in initialize
    v.initialize(None, ctx, init, force_reinit=force_reinit)
  File "/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 485, in initialize
    self._finish_deferred_init()
  File "/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 364, in _finish_deferred_init
    self._init_impl(data, ctx)
  File "/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 377, in _init_impl
    self._init_grad()
  File "/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 388, in _init_grad
    .format(self._grad_stype))
ValueError: mxnet.numpy.zeros does not support stype = row_sparse

To Reproduce

from mxnet import np, npx, gluon

class Block(gluon.Block):
  def __init__(self):
    super().__init__()
    self.weight = gluon.Parameter('weight', shape=(32,32)), grad_stype='row_sparse')
  def forward(self, x):
    return npx.embedding(x, weight=self.weight.data(), input_dim=32, output_dim=32, sparse_grad=True)

b = Block()
b.initialize()

x = np.ones((32, 32))
r = b(x)
print(r)

Environment

----------Python Info----------
Version      : 3.7.5
Compiler     : Clang 4.0.1 (tags/RELEASE_401/final)
Build        : ('default', 'Oct 25 2019 10:52:18')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 21.1.2
Directory    : /Users/fhieber/anaconda3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 2.0.0
Directory    : /Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet
Commit Hash   : dc69b04070c55f33c1ac2dc83be42be9c1a8c56f
Library      : ['/Users/fhieber/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.dylib']

barry-jin commented 3 years ago

@fhieber Currently in numpy mode Gluon 2.0, sparse feature is not supported.

fhieber commented 3 years ago

I see, thanks. Are there plans to re-add this? Sparse gradient updates for embedding matrices provided noticable improvements in training throughput in the past.

barry-jin commented 3 years ago

MXNet2.0 NumPy array will need to follow the python array API standard, so we will probably not add sparse feature for NumPy arrays. But, I'm working on a work around to help users to fallback to legacy and sparse gradients when sparse grad is required in parameters and some operators.