HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 5 forks source link

Looks linear #82

Closed ClashLuke closed 1 year ago

ClashLuke commented 1 year ago

Looks-Linear is not necessarily better than the baseline: grafik

fixing the conv bias (higher init to achieve the same std as before) doesn't help either. grafik

and everything underperforms the main branch grafik

Closing PR.\ After these tests, convscale can be safely removed. Additionally, #73 took away the ability to transfer weights from small to large models using the current weight transfer methods, so they can be safely removed as well.