Question on transformers models

alexorona commented 4 years ago

I'm attempting to find a good model parallel library for large NLP transformer models. Transformers by HuggingFace is the main library that makes pretrained models accessible. These models can be 1.5 billion parameters or more. Using ModelParallel from eisen on models like T5, DialoGPT, BERT and GPT2 (here's a link as an example) would be simply amazing. The improvements from using large transformer models are convincing. See also here.

The current PyTorch implementation from the transformers library passes a set of tensors to the model to combine (input ids, an attention mask and sometimes token types). As a result, using ModelParallel on a model from the transformers library will expectedly raise NotImplementedError: Support for modules with more than one input is not yet implemented. Similarly, a single forward pass through the model will yield either a dictionary or tuple, so there are multiple outputs as well as multiple inputs.

Is there any way to use Eisen with this, given that support for multiple inputs and multiple outputs seems to have been removed?

exelents commented 4 years ago

I have the same problem - need to train large model (t5-11b) but can't find a simple solution to make a model parallelism for this task. Huggingface transformers library seems not support it in the current release. Eisen could be a great solution, but due to transformers use several inputs (tokens+mask in my case) it cannot work now.

exelents commented 4 years ago

Is it even real to create automatic modell parallelizm for multi-input models? I'm looking at your code and having some doubts...

alexorona commented 4 years ago

Yeah, I think you're right @exelents. Will close the issue. Might have another solution, but won't be easy. @exelents , hit me up on email or LinkedIn to extend discussion.

eisen-ai / eisen-core

Question on transformers models #37