NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.77k stars 882 forks source link

Some examples do not appear to use ALIBI bias for models like BLOOM #558

Open abhi-mosaic opened 1 year ago

abhi-mosaic commented 1 year ago

Hi! I am interested in using FasterTransformer with models that apply an ALIBI bias to the attention map, like the BLOOM family.

I see that the CUDA kernels support a parameter linear_bias_slopes, and that in the GptContextDecoder model class, the forward() signature accepts linear_bias_slopes: https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L501-L508

The only place I see see this GptContextDecoder model class used in the Gpt wrapper below: https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L797-L803

But when this Gpt class calls forward on its context_decoder, it does not pass in the linear_bias_slopes argument.

https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/gpt_decoder.py#L1197-L1203

This makes me believe that in the example scripts such as multi_gpu_gpt_example.py script, that use the Gpt class, if you use it to load BLOOM weights into the Gpt class, it won't be using ALIBI, is that correct? Or is there some other codepath that is handling ALIBI?

If this is indeed a bug then I think any other FT scripts demonstrating BLOOM would suffer the same problem b/c I don't see linear_bias_slopes being called elsewhere.

Thank you for your help!

abhi-mosaic commented 1 year ago

I think the behavior down the other codepath that uses Bloom(parallel_gpt.ParallelGPT) is ok because it relies on use_attention_linear_bias. So I guess this is the recommended route?

https://github.com/NVIDIA/FasterTransformer/blob/4402759e48f2340220638675f464b6ba1f79ac3c/examples/pytorch/gpt/utils/bloom.py#L259

byshiue commented 1 year ago

For bloom, you should use bloom.py but not gpt.py. We don't add the option in gpt.py because no gpt model needs the bias now