Best practices for automated testing in downstream libraries?

sibocw commented 4 months ago

Hi,

I'm writing a package for animal biomechanics simulation using MuJoCo. I'm looking for some help with testing and automation.

Since exact reproducibility of the engine "is only guaranteed within a single version, on the same architecture" (according to the Reproducibility section of the docs), what are the best practices for automated testing in downstream libraries that use MuJoCo? Clearly asserting that results are exactly as expected is a bad idea...

Thank you in advance for your suggestions.

yuvaltassa commented 4 months ago

Great question!

One answer is to make golden tests but

Run for not too many timesteps (10-100) and assert difference to some epsilon that is large for numerics but small for the physics (1e-6ish).
Make it very easy for you to regenerate your golden data.

This way your test will tell you if Something Very Bad happened and maybe you'll find yourself updating the goldens a few times after seeing that everything is actually fine.

Another answer is to pin to a version and only upgrade when you really want a new feature, but then check carefully that everything is okay.

The definition of "everything is okay" really depends on your use case. If you are writing a suite of benchmarks then just pin the version and never upgrade. If you are doing something else then write whatever corresponds to "is everything okay?" for you...

Does this make sense?

sibocw commented 4 months ago

Thanks Yuval! These are very helpful. I will also specify the type of runner when running these tests with GitHub Actions. Consequently, it might become a good idea to avoid running the tests locally on machines with different architectures and rely on CI/CD instead.

google-deepmind / mujoco

Best practices for automated testing in downstream libraries? #1661