NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.73k stars 882 forks source link

Implementations of GPT/GPT-J/GPT-Neox #553

Open maltoak opened 1 year ago

maltoak commented 1 year ago

Hi,

GPT/GPT-J/GPT-Neox have similar nn architecures. In my view, the implementations of them in src/fastertransformer/models (multi_gpu_gpt, gptj,gptneox) are also very similar. I am wondering why they are implemeted separately, instead of into one general implementation. I guess there may be some historical engineering problems or that each implementation was optimized for the best performance. Could you please let me know your thoughts on this?

Thanks!

byshiue commented 1 year ago

Because some modules cannot reuse at the time.

maltoak commented 1 year ago

I see, thanks! Recently numerous new GPT models have been open-sourced. I am curious to know if there are any plans to combine GPT-related code into a single, general implementation that can support different types of GPT models. Alternatively, if I wanted to leverage FasterTransformer to work with a different GPT model, which implementation would you suggest I base it on, multi_gpu_gpt, gptj, or gptneox?

Thanks.

byshiue commented 1 year ago

For merging codes, we will consider.

If you want to leverage exists implementation, you should understand what's difference of your model and exists models, and choose the closest one.

maltoak commented 1 year ago

Thanks. I am just worry about potential compatibility problems. Maybe I can just develop a new model based on current status of FT implementations, and then keep up with any future updates that are released.