Closed kywch closed 6 months ago
RecurrentWrapper class can accept num_layer = 0, so the model scaling experiment can run from 0 to N layers.
RecurrentWrapper class can accept num_layer = 0, so the model scaling experiment can run from 0 to N layers.