train a model with a proper initialization and one with zero-init till convergence. optimize the hyperparameter for each model individually so that each model achieves the best possible performance. log the performance on the validation dataset. report results.
train a model with a proper initialization and one with zero-init till convergence. optimize the hyperparameter for each model individually so that each model achieves the best possible performance. log the performance on the validation dataset. report results.