Looks-Linear is not necessarily better than the baseline:
fixing the conv bias (higher init to achieve the same std as before) doesn't help either.
and everything underperforms the main branch
Closing PR.\
After these tests, convscale can be safely removed. Additionally, #73 took away the ability to transfer weights from small to large models using the current weight transfer methods, so they can be safely removed as well.
Looks-Linear is not necessarily better than the baseline:
fixing the conv bias (higher init to achieve the same std as before) doesn't help either.
and everything underperforms the main branch
Closing PR.\ After these tests, convscale can be safely removed. Additionally, #73 took away the ability to transfer weights from small to large models using the current weight transfer methods, so they can be safely removed as well.