Best way to fix failing tests

bnaecker commented 7 years ago

I'm looking at the Travis failing builds and wondering what the best way to fix them is. The problem is that the baseline images used in the comparison for pyret.visualizations have size 800x600, but the images generated in test_visualizations.py are 640x480. From the build history in Travis, it looks like the break happened when moving from matplotlib version 1.5 to 2.0. Build #93 has 1.5 and passed; build #95 has 2.0 and exhibits the failures.

Anyway we need to regenerate the baseline test images. It looks like I just did this by hand on my machine when I added the original tests, in commit 453b9b. Listing the files added in that commit with git diff-tree -r 453b9b shows all the tests/test-images/baseline*.png files.

We should probably have a separate script or method to auto-generate them, both for stability and to avoid this problem in the future. This could be in the pyret/tests/utils.py module, and we should probably add more parameters to the accompanying matplotlibrc so that images are always guaranteed to be the same size. Setting the figure.figsize and figure.dpi parameters should be sufficient to guarantee that.

Ideally, pyret/tests/utils.py should have the routines to actually plot all the images being tested, along with a save_test_images() method which would save the baseline versions. Then the test_visualizations.py test module would import utils, use the same routines to create the data and plot them, and then save its own test images for comparison with the baseline. This is way easier to maintain, because we only write the actual plotting code in utils. It's used once in a blue moon to save the images (like we need now), and then again when running tests by importing it into the test script.

But this might be a bit strange. After all, we'd be using the same code to generate the baseline and test images, which feels like begging the question.

Thoughts?

nirum commented 7 years ago

Thanks for looking into this. Yes, it seems silly to have the same code generate the baseline and test images--however--as long as we only generate the baseline images occasionally (and manually check them to make sure they look right), then the test will serve its purpose (informing us when something changes how the figures look). If we generate the baseline images every time we run the test, then that is pointless.

bnaecker commented 7 years ago

I meant the former case, not the latter. The baseline images would be generated only once in a while, whenever something odd like this resizing issue comes up. When the normal tests are run, only the test version of the image is created, and it’s compared against the saved baseline.

Another option is to throw out the idea of comparing images altogether. As things like this problem point out, we don’t really care what the image looks like, but that the underlying data being plotted is correct. This is most crucial for things like visualizations.temporal, which may actually do some computation before plotting, as when it’s passed a 3D linear filter. So we can do something like asset that the plot object’s data is what we expect.

On 23 Apr 2017, at 00:28, Niru Maheswaranathan notifications@github.com wrote:

Thanks for looking into this. Yes, it seems silly to have the same code generate the baseline and test images--however--as long as we only generate the baseline images occasionally (and manually check them to make sure they look right), then the test will serve its purpose (informing us when something changes how the figures look). If we generate the baseline images every time we run the test, then that is pointless.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/baccuslab/pyret/issues/91#issuecomment-296425191, or mute the thread https://github.com/notifications/unsubscribe-auth/ADG645uZqF8U9-P4chYY30AiEglnk-3Aks5ryv20gaJpZM4NE9aI.

bnaecker commented 7 years ago

Closed via commit d9dc495.

bnaecker commented 7 years ago

Ok, this is just silly. The tests all pass on my local machine, but fail on Travis. I'm assuming this is because of some small issues with the actual rendering engine used on those VMs to save the images to disk.

It's not fruitful (or sane) for me to change parameters in the matplotlibrc in the hopes that I'll eventually settle on something that makes the tests pass. Even if that does work, there's always the possibility that something else will change in the future and we'll have to do this again.

My suggestion, based on the comment by Michael Waskom here, is to instead test the data that should be in the plots. We really don't care what the image looks like, right? Everyone will have their own backends, rc files, etc. What we want is for the data to be correct in the plots, especially for plotting functions like visualizations.psth which actually do some computation on their inputs.

I vote for rewriting the test strategy to avoid images and check the values of the data plotted in the images.

nirum commented 7 years ago

Sounds good to me! Thanks for taking care of this

On Apr 25, 2017, at 6:56 PM, bnaecker notifications@github.com wrote:

Ok, this is just silly. The tests all pass on my local machine, but fail on Travis. I'm assuming this is because of some small issues with the actual rendering engine used on those VMs to save the images to disk.

It's not fruitful (or sane) for me to change parameters in the matplotlibrc in the hopes that I'll eventually settle on something that makes the tests pass. Even if that does work, there's always the possibility that something else will change in the future and we'll have to do this again.

My suggestion, based on the comment by Michael Waskom here, is to instead test the data that should be in the plots. We really don't care what the image looks like, right? Everyone will have their own backends, rc files, etc. What we want is for the data to be correct in the plots, especially for plotting functions like visualizations.psth which actually do some computation on their inputs.

I vote for rewriting the test strategy to avoid images and check the values of the data plotted in the images.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

baccuslab / pyret

Best way to fix failing tests #91