Open RaymondLi0 opened 1 year ago
Get an idea of the different flavours of scaling-law works that are out there. Any work that tries to estimate the optimal scale of model and dataset size, with regards to a certain metric (PPL, or other) Some references:
Particularly, it would be good to know if some previous work examined downstream performance instead of PPL in a scaling-law setting
Additional papers:
Other related work:
Get an idea of the different flavours of scaling-law works that are out there. Any work that tries to estimate the optimal scale of model and dataset size, with regards to a certain metric (PPL, or other) Some references: