NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.94k stars 2.25k forks source link

msdp gpt pretrained model max support 1024? #207

Closed XqFeng-Josie closed 1 month ago

XqFeng-Josie commented 2 years ago

I see the msdp prompt_knwl_gen.sh and prompt_resp_gen.sh both use the model is 375gpt, that means sep_length can be set 2048. Then I use the 345gpt, which just can be set as 1024, which make the response prompts be truncatted.

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.