Closed sibocw closed 4 months ago
Great question!
One answer is to make golden tests but
This way your test will tell you if Something Very Bad happened and maybe you'll find yourself updating the goldens a few times after seeing that everything is actually fine.
Another answer is to pin to a version and only upgrade when you really want a new feature, but then check carefully that everything is okay.
The definition of "everything is okay" really depends on your use case. If you are writing a suite of benchmarks then just pin the version and never upgrade. If you are doing something else then write whatever corresponds to "is everything okay?" for you...
Does this make sense?
Thanks Yuval! These are very helpful. I will also specify the type of runner when running these tests with GitHub Actions. Consequently, it might become a good idea to avoid running the tests locally on machines with different architectures and rely on CI/CD instead.
Hi,
I'm writing a package for animal biomechanics simulation using MuJoCo. I'm looking for some help with testing and automation.
Since exact reproducibility of the engine "is only guaranteed within a single version, on the same architecture" (according to the Reproducibility section of the docs), what are the best practices for automated testing in downstream libraries that use MuJoCo? Clearly asserting that results are exactly as expected is a bad idea...
Thank you in advance for your suggestions.