computationalmodelling / nbval

A py.test plugin to validate Jupyter notebooks
Other
436 stars 51 forks source link

Pattern matching for images #10

Open rpep opened 7 years ago

rpep commented 7 years ago

Is there a way to pattern match the actual HTML tags for images? HoloViews returns a binary blob, and I've not been able to get the pattern to match with this:

[holoviews] regex: <img*> replace: HOLOVIEWS

michaelaye commented 7 years ago

have a look at @minrk 's https://github.com/jupyter/nbdime, a tool to diff notebooks. I think there's something in there that deals with diffing binary content, maybe that could help your case?

minrk commented 7 years ago

nbdime identifies binary content and replaces it with something like <snip base64 md5=abc...>. There isn't a mechanism to replace this with something else, though.

fangohr commented 7 years ago

@minrk an option might be instead of diffing images or image objects to display both versions side by side (as nbdime can do) in case of conflict. This will give the user a convenient way of comparing them, and addresses a common use case for nbval:

  1. Code is written,
  2. Notebook is used to create documentation/tutorials for that code.
  3. Notebook runs through nbval and 'passes' all the tests (i.e. output cells stored in notebook are consistent with the code that it used to compute that output from the input cells).
  4. Code is changed (happens all the time).
  5. Notebook runs through nbval and fails some of the tests.

At this point, it will often be the case that a minor change in the code base results in this fail; this could include a change of colour map and symbol or even updated matplotlib library version. Once the human can see both images, and see that the changes are irrelevant to the role of the test, they can just accept that this 'fail' is not a fail but just needs saving of the updated notebook output.

PS Actually, having written the above, I realise that this may be a bit out of context: I think at the moment, nbval ignores images completely, and it is not even clear we should change this. However, I like the idea of using nbdime to display changes in output cells to allow users to decide whether it is okay to 'update the notebooks' and save the new version in the scenario outline above. So I'll keep that thinking here until we find a better place.

minrk commented 7 years ago

I agree that displaying images side-by-side for human review is probably the best to go for. Computationally comparing images is only really doable if your goal is pixel-perfect comparison, which seems rarely desirable. Subtle changes like available fonts, margins, etc. will make image comparisons fail, but a human can easily enough say "those are fine".

fangohr commented 7 years ago

Exactly. What is not fully clear is how to integrate this into the current workflow for nbval: at the moment, we ignore images, we just scan the text representation, and - if configured accordingly - ignore memory addresses in that textual representation, as these change with every run.

We could offer (maybe an extra switch --nbval-imdiff or --nbval-nbdime?) to produce a html document that shows (i) changes for every output cell where these are detected, and (i) for all images the nbdime side-by-side comparison for the user to decide if they are the same. We could include bitwise comparison ('pixel-perfect') of the figures, and only display the side-by-side comparison if there are any bitwise differences.

mscuthbert commented 7 years ago

The Vexflow project (javascript notation rendering) has implemented comparing images for testing while allowing some small amount of fuzziness. It's a totally different language, etc., but they were able to find an off-the-shelf similarity algorithm that ran pretty fast (and also in case of failure, gives the side by side comparison which often reveals regressions but sometimes just means that the allowable amount of difference needs to be increased). So it's doable...whether it's worth the programmer effort is another thing.