EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.81k stars 988 forks source link

Support for custom model architecture #1117

Closed itsnamgyu closed 6 months ago

itsnamgyu commented 8 months ago

I'm building a custom architecture involving multiple existing architectures as sub-components (Pythia, RoBERTa, T5, etc).

Does this library support custom architectures? If not, could someone give me some pointers on how to approach it? (e.g., use a different library, re-build the architecture using provided model components)

I'm planning to run pre-training from scratch up to 7B params. I'm mainly interested in using this library for its FlashAttention support and ease of multi-node training.

Quentin-Anthony commented 8 months ago

Hey there! Yes I think this is doable, but would take some effort to add the new architectures given that we only have GPT architectures supported here right now.

In terms of approaching things, since we're a megatron-based framework and many have added these architectures to other megatron-based frameworks, I'd recommend porting those implementations under our https://github.com/EleutherAI/gpt-neox/tree/main/megatron/model

There was a gpt-neox t5 effort at https://github.com/EleutherAI/gpt-neox/tree/t5-shared-params that you could start off from for t5 for example. T5 is now also in the upstream Megatron (https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/t5_model.py)

I would be happy to discuss this with you along the way and help on the effort if you go for it!

itsnamgyu commented 8 months ago

Thanks I'll check them out!

nairbv commented 7 months ago

@itsnamgyu This might be helpful... here's an example where I use lm eval in another unrelated repo with custom models: https://github.com/foundation-model-stack/foundation-model-stack/pull/154

itsnamgyu commented 7 months ago

@nairbv Thanks a lot!

JDRanpariya commented 6 months ago

Hey this sounds interesting, I'm planning to recreate model that's written in Pytorch with this library. Given it's custom architecture, what are things I need to consider and need to plan so that I can take benefit of Gpt-NeoX for distributed training. Any pointers or guidance would help.

I looked up the T5 model implementation as well on T5-shared-params branch I would like to know if it's only required to create a model file similar to gpt2_model.py in models directory or do I need to make changes with Megatron as well. It would be helpful if you can provide me with in idea of what changes are required to incorporate a custom model architecture.

itsnamgyu commented 6 months ago

@JDRanpariya I've actually decided to use the HuggingFace implementation of GPTNeoX with deepspeed and FlashAttention2 for now. I'm not working with T5 or RoBERTa at the moment.

jd-inferq commented 6 months ago

Okay, Thanks!

On Fri, Feb 16, 2024 at 4:51 PM Namgyu Ho @.***> wrote:

@JDRanpariya https://github.com/JDRanpariya I've actually decided to use the HuggingFace implementation of GPTNeoX with deepspeed and FlashAttention2 for now. I'm not working with T5 or RoBERTa at the moment.

— Reply to this email directly, view it on GitHub https://github.com/EleutherAI/gpt-neox/issues/1117#issuecomment-1948211184, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGFXCFVXRPKJCN3BHMJDBD3YT46LPAVCNFSM6AAAAABBWAHIWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYGIYTCMJYGQ . You are receiving this because you commented.Message ID: @.***>

StellaAthena commented 6 months ago

OP has decided to pursue a different approach than mod this library.

jd-inferq commented 6 months ago

Yep, got it! I guess people wanting to do it would do it anyhow but I think this issue is good starting point. Is it possible to move it to discussions? might help people who want to do similar in future.