Any lessons from Imbue for training-in-the-large?

ahrefs / ocannl

OCANNL: OCaml Compiles Algorithms for Neural Networks Learning

BSD 2-Clause "Simplified" License

67 stars 2 forks source link

Any lessons from Imbue for training-in-the-large? #270

Open lukstafi opened 4 months ago

lukstafi commented 4 months ago

https://imbue.com/research/70b-infrastructure/

"In the span of a few months, with a small team of researchers and engineers, we trained a 70B parameter model from scratch on our own infrastructure that outperformed zero-shot GPT-4o on reasoning-related tasks.

Today, we’re sharing an end-to-end guide for setting up the required infrastructure: from bringing up the initial cluster and installing the OS, to automatically recovering from errors encountered during training."

lukstafi commented 4 months ago

Also, from llm.c: https://github.com/karpathy/llm.c/discussions/677