Currently, our codebase is untested and needs manual evaluation to figure out if a PR broke something or if it's valid and ready to be merged. We could start a dedicated TPU that tries to overfit on a single batch to avoid this effort. This way, we could easily have a sanity check that the model can do one forward pass, learn, and its gradients are correct.\
This issue tracks the progress of implementing such an automated testing infrastructure.
Currently, our codebase is untested and needs manual evaluation to figure out if a PR broke something or if it's valid and ready to be merged. We could start a dedicated TPU that tries to overfit on a single batch to avoid this effort. This way, we could easily have a sanity check that the model can do one forward pass, learn, and its gradients are correct.\ This issue tracks the progress of implementing such an automated testing infrastructure.