I am researching scaling laws across models and architectures among other things and was wondering if you could share the logs\training losses\val eval of the models you have ran for the scaling law experiments in DeepSeek LLM. If you have other similar losses or results it would also be interesting. It might not be super well curated, anything can be helpful.
Thanks
I am researching scaling laws across models and architectures among other things and was wondering if you could share the logs\training losses\val eval of the models you have ran for the scaling law experiments in DeepSeek LLM. If you have other similar losses or results it would also be interesting. It might not be super well curated, anything can be helpful. Thanks