Automated Integration Tests

Currently, our codebase is untested and needs manual evaluation to figure out if a PR broke something or if it's valid and ready to be merged. We could start a dedicated TPU that tries to overfit on a single batch to avoid this effort. This way, we could easily have a sanity check that the model can do one forward pass, learn, and its gradients are correct.\ This issue tracks the progress of implementing such an automated testing infrastructure.

HomebrewNLP / Olmax

Automated Integration Tests #30