NVIDIA / sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Other
1.06k stars 202 forks source link

Multi GPU with transfer.py #48

Open dwinkler1 opened 5 years ago

dwinkler1 commented 5 years ago

This is more of a feature request I guess. Is it possible to use multiple GPUs for the transfer code. It tried to implement this myself with

model = nn.DataParallel(model)

but that did not work because it needs to call model.module.rnn.

Traceback (most recent call last):
  File "transfer2.py", line 185, in <module>
    trXt, trY = transform(model, train_data)
  File "transfer2.py", line 143, in transform
    model.rnn.reset_hidden(batch_size)
  File "/home/imsm/.conda/envs/jupyterlab/lib/python3.6/site-packages/torch/nn/modules/module.py", line 518, in __getattr__
    type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'rnn'

But if I rename to

modelpar = nn.DataParallel(model)
model = modelpar.module

I'm back to a single GPU. Do I have to call model.module.rnn in every instance or does this not work at all?

raulpuric commented 5 years ago

So the way DataParallel works is the .forward method takes cpu data (note that the tensors have to be on cpu) and broadcasts it to all the available GPUs where it passes the GPU data to the model's forward method. What you tried didn't work because you only passed the data to the model's forward method not DataParallel's forward method. So you need to 1) Use the full dataparallel module for forward 2) Any time you need to access a model attribute, access it from the original modelpar.module module 3) Use cpu data type so that the model's forward can automatically send the data to the right gpu.

We had an implementation for this working in our original release of the codebase if you'd like a reference https://github.com/NVIDIA/sentiment-discovery/releases/tag/v0.1. It was too difficult to maintain while trying to add new features so we deprecated it.