Open maltoak opened 1 year ago
Because some modules cannot reuse at the time.
I see, thanks! Recently numerous new GPT models have been open-sourced. I am curious to know if there are any plans to combine GPT-related code into a single, general implementation that can support different types of GPT models. Alternatively, if I wanted to leverage FasterTransformer to work with a different GPT model, which implementation would you suggest I base it on, multi_gpu_gpt
, gptj
, or gptneox
?
Thanks.
For merging codes, we will consider.
If you want to leverage exists implementation, you should understand what's difference of your model and exists models, and choose the closest one.
Thanks. I am just worry about potential compatibility problems. Maybe I can just develop a new model based on current status of FT implementations, and then keep up with any future updates that are released.
Hi,
GPT/GPT-J/GPT-Neox have similar nn architecures. In my view, the implementations of them in
src/fastertransformer/models
(multi_gpu_gpt
,gptj
,gptneox
) are also very similar. I am wondering why they are implemeted separately, instead of into one general implementation. I guess there may be some historical engineering problems or that each implementation was optimized for the best performance. Could you please let me know your thoughts on this?Thanks!