SmerkyG / gptcore

Fast modular code to create and train cutting edge LLMs
Apache License 2.0
62 stars 9 forks source link

Add a small "pre-train size calculation" section to the main read.me #5

Open bennmann opened 7 months ago

bennmann commented 7 months ago

It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.

ie: For RWKV to initialize a pre-train model of size 70B set flags: example.py --n_embd XXXXX etc For transformer to initialize a pre-train model of size 70B set flags: example2.py --some_flag etc

Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)

SmerkyG commented 7 months ago

There is no such initialization needed currently. Let me know if there's something I'm misunderstanding about your request tho!

On Wed, Feb 7, 2024, 9:44 AM bennmann @.***> wrote:

It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.

ie: For RWKV to initialize a pre-train model of size 70B set flags: example.py --n_embd XXXXX etc For transformer to initialize a pre-train model of size 70B set flags: example2.py --some_flag etc

Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)

— Reply to this email directly, view it on GitHub https://github.com/SmerkyG/gptcore/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDK33VPGREMHUF3VGQGKWDYSPKRHAVCNFSM6AAAAABC6NLQ5OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNZUHE4TAMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>