Add a small "pre-train size calculation" section to the main read.me

There is no such initialization needed currently. Let me know if there's something I'm misunderstanding about your request tho!

On Wed, Feb 7, 2024, 9:44 AM bennmann @.***> wrote:

It would be good to have a section in the top level read.me on initializing a new pre-train model of various common sizes (up to 70B) for each architecture.

ie: For RWKV to initialize a pre-train model of size 70B set flags: example.py --n_embd XXXXX etc For transformer to initialize a pre-train model of size 70B set flags: example2.py --some_flag etc

Do this for each arch at 3B, 7B, 34B, 70B (or just a few small and large examples each)

— Reply to this email directly, view it on GitHub https://github.com/SmerkyG/gptcore/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDK33VPGREMHUF3VGQGKWDYSPKRHAVCNFSM6AAAAABC6NLQ5OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNZUHE4TAMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

SmerkyG / gptcore

Add a small "pre-train size calculation" section to the main read.me #5