End-to-end Rendering Golden Tests

What problem does this solve or what need does it fill?

Rendering breakages are often a result of collective changes to the rendering pipeline, and are rarely local, which makes it tough to update the renderer in broad strokes without blatantly breaking something.

What solution would you like?

Setup "golden" scenes for the renderer.
Use a binary to render them to a file in a controlled environment. Preferably on multiple platforms to validate correctness on all target platforms.
Save the rendered image as a golden reference image.
Create automated tests that repeat the same process and diff the new images against the reference. If references differ too greatly, fail the test.
Add automated tests to CI.
Intentionally broken test (i.e. via new features) are updated alongside the PR that broke them.

What alternative(s) have you considered?

Rely on user bug reports to QA the renderer output pre/post-release.

Additional context

Large images in git getting regularly updated can be pretty heavy. It may not be beneficial to keep these tracked in the main repo, but in some Git LFS-based repo that we validate against before a train release to ensure nothing is critically broken.

The above methodology was borrowed from existing golden-image tests commonly used in synthetic data generation for machine learning systems.

bevyengine / bevy