Open bennmann opened 7 months ago
There is no such initialization needed currently. Let me know if there's something I'm misunderstanding about your request tho!
On Wed, Feb 7, 2024, 9:44 AM bennmann @.***> wrote:
It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.
ie: For RWKV to initialize a pre-train model of size 70B set flags: example.py --n_embd XXXXX etc For transformer to initialize a pre-train model of size 70B set flags: example2.py --some_flag etc
Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)
— Reply to this email directly, view it on GitHub https://github.com/SmerkyG/gptcore/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDK33VPGREMHUF3VGQGKWDYSPKRHAVCNFSM6AAAAABC6NLQ5OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNZUHE4TAMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.
ie: For RWKV to initialize a pre-train model of size 70B set flags:
example.py --n_embd XXXXX etc
For transformer to initialize a pre-train model of size 70B set flags:example2.py --some_flag etc
Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)