Closed alexorona closed 4 years ago
I have the same problem - need to train large model (t5-11b) but can't find a simple solution to make a model parallelism for this task. Huggingface transformers library seems not support it in the current release. Eisen could be a great solution, but due to transformers use several inputs (tokens+mask in my case) it cannot work now.
Is it even real to create automatic modell parallelizm for multi-input models? I'm looking at your code and having some doubts...
Yeah, I think you're right @exelents. Will close the issue. Might have another solution, but won't be easy. @exelents , hit me up on email or LinkedIn to extend discussion.
I'm attempting to find a good model parallel library for large NLP transformer models. Transformers by HuggingFace is the main library that makes pretrained models accessible. These models can be 1.5 billion parameters or more. Using
ModelParallel
from eisen on models like T5, DialoGPT, BERT and GPT2 (here's a link as an example) would be simply amazing. The improvements from using large transformer models are convincing. See also here.The current PyTorch implementation from the transformers library passes a set of tensors to the model to combine (input ids, an attention mask and sometimes token types). As a result, using
ModelParallel
on a model from the transformers library will expectedly raiseNotImplementedError: Support for modules with more than one input is not yet implemented
. Similarly, a single forward pass through the model will yield either a dictionary or tuple, so there are multiple outputs as well as multiple inputs.Is there any way to use Eisen with this, given that support for multiple inputs and multiple outputs seems to have been removed?