NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.76k stars 2.19k forks source link

How about supporting alternatives to fine-tuning? #114

Closed hwijeen closed 4 days ago

hwijeen commented 3 years ago

Hi, thank you for the great library.

Recently, many algorithms are proposed to replace fine-tuning as it incurs too many burdens, especially with huge models like GPT3. Examples include P-tuning and LoRA. I personally implemented both on top of Megatron-LM and was able to achieve SOTA accuracy on a number of Korean benchmark datasets (I used model size ranging from 300M to 82B).

How about supporting the algorithms like the above? I think it's an extension of the current --fine-tune option, and is a big plus in terms of the practicality of huge models.

alex-ht commented 1 year ago

can you share your implementation? thanks!

BrightXiaoHan commented 1 year ago

@hwijeen +1

wsh2836741 commented 1 year ago

@hwijeen +1

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

hwijeen commented 1 year ago

Hi, sorry for the delayed response. I won't be able to make a PR as I no longer have access to the code, it was from my previous job. I'd be happy to work together as a side project if people are still interested.

hwijeen commented 11 months ago

@alex-ht @BrightXiaoHan @wsh2836741 Could you share what are your use cases? Are you trying to peft-tune GPT models for conditional generation tasks?

And may I ask what made you reopen this issue @jon-barker? Are you working on this?

alex-ht commented 10 months ago

hi @hwijeen

Are you trying to peft-tune GPT models for conditional generation tasks?

yes.

I found this tutorial, and it seems that NeMo can inject LoRA adapters into megatron-lm model. https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/lora.ipynb

github-actions[bot] commented 8 months ago

Marking as stale. No activity in 60 days.

AlpinDale commented 4 months ago

It would be great if Megatron-LM could support PEFT methods, e.g. QLoRA. We're sorely lacking a PEFT trainer with Tensor Parallelism.

github-actions[bot] commented 2 months ago

Marking as stale. No activity in 60 days.