bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

Literature review on scaling laws #21

Open RaymondLi0 opened 1 year ago

RaymondLi0 commented 1 year ago

Get an idea of the different flavours of scaling-law works that are out there. Any work that tries to estimate the optimal scale of model and dataset size, with regards to a certain metric (PPL, or other) Some references:

RaymondLi0 commented 1 year ago

Particularly, it would be good to know if some previous work examined downstream performance instead of PPL in a scaling-law setting

keyboardAnt commented 1 year ago

Additional papers:

RaymondLi0 commented 1 year ago

Other related work: