Open abhi-mosaic opened 1 year ago
I think the behavior down the other codepath that uses Bloom(parallel_gpt.ParallelGPT)
is ok because it relies on use_attention_linear_bias
. So I guess this is the recommended route?
For bloom, you should use bloom.py
but not gpt.py
. We don't add the option in gpt.py because no gpt model needs the bias now
Hi! I am interested in using FasterTransformer with models that apply an ALIBI bias to the attention map, like the BLOOM family.
I see that the CUDA kernels support a parameter
linear_bias_slopes
, and that in theGptContextDecoder
model class, theforward()
signature acceptslinear_bias_slopes
: https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L501-L508The only place I see see this
GptContextDecoder
model class used in theGpt
wrapper below: https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L797-L803But when this
Gpt
class calls forward on itscontext_decoder
, it does not pass in thelinear_bias_slopes
argument.https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L1197-L1203
This makes me believe that in the example scripts such as
multi_gpu_gpt_example.py
script, that use theGpt
class, if you use it to load BLOOM weights into theGpt
class, it won't be using ALIBI, is that correct? Or is there some other codepath that is handling ALIBI?If this is indeed a bug then I think any other FT scripts demonstrating BLOOM would suffer the same problem b/c I don't see
linear_bias_slopes
being called elsewhere.Thank you for your help!