Open Elijah-Yi opened 7 months ago
For large batches (256), there is a problem of loss non convergence
For large batches (256), there is a problem of loss non convergence