AFM-SPM / TopoStats

An AFM image analysis program to batch process data and obtain statistics from images
https://afm-spm.github.io/TopoStats/
GNU Lesser General Public License v3.0
60 stars 11 forks source link

MPL interpolation causes lines in MPL Images #457

Closed MaxGamill-Sheffield closed 1 year ago

MaxGamill-Sheffield commented 1 year ago

Describe the bug Horizontal and vertical lines are seen in the matplotlib.imshow images when they are saved.

MPL image: 20221214_BW_mcc339+1_hTOP1_1000x_0 0_00051_processed (1)

Screenshot 2023-01-04 at 13 59 03

vs using plt.imsave to save the image directly. 20221214_BW_mcc339+1_hTOP1_1000x_0 0_00051_processed

To Reproduce Run high resolution image through TopoStats with either, or both, the colorbar and axes selected to induce a matplotlib plot. See that on these images, the data is interpolated and so show horizontal and vertical lines.

Expected behavior Image should be clear and un-interpolated.

Output Screenshots above.

Additional context This could be down to the "nearest"-neighbour interpolation or the DPI setting.

ns-rse commented 1 year ago

The specific lines that use imshow are in topostats/plottingfuncs.py in the save_figure() method (L194 and L205).

There is also a single use in tests/test_dnatracing.py but this will be removed/changed as part of the DNA Tracing refactoring (#183). The method should be refactored to create the images and then save them.

There are several options for this

DPI is mentioned as a possible confounder, in which case it may be sensible to introduce this as a configurable option when initialising the class and via configuration file, but with validation to ensure it is above a minimum threshold (what that threshold should be is yet to be determined).

MaxGamill-Sheffield commented 1 year ago

Compared to theimsave-produced image, it seems that the value null /None is the best for resolving this issue. However, the tests don't seem to reflect this.

Imsave:

Screenshot 2023-02-14 at 11 54 28

null / None:

Screenshot 2023-02-14 at 11 54 53 Screenshot 2023-02-14 at 12 05 59

"none":

Screenshot 2023-02-14 at 11 54 43 Screenshot 2023-02-14 at 12 06 41

"nearest":

Screenshot 2023-02-14 at 11 54 36 Screenshot 2023-02-14 at 12 15 10

The docs say no data interpolation is done when "none" used, when None the rcParams backend default is used (possibly antialiasing). However, we still see the lines even when "none" is used - might be a dpi issue?

"none" with dpi=1000 instead of "figure" also seems to resolve the issue. @ns-rse do we think increasing this value (although increasing test and image production times) is more suitable than remaking the rcParams file?

Screenshot 2023-02-14 at 12 39 24 Screenshot 2023-02-14 at 12 21 37
ns-rse commented 1 year ago

Starting some notes as I work through this.

As noted above plt.imsave() is a Matplotlib method albeit different to plt.imgshow() and plt.savefig(). And so specification of colorbar and/or axes doesn't induce Matplotlib images because they are all Matplotlib methods for producing/saving images.

But that doesn't change the current situation that we are observing interpolation that is unexpected.

Defaults

The default rcParams appear to have the value of img.interpolation: antialiased (typically in such configuration files the defaults are provided but commented out, users can the uncomment and modify, or copy the line uncomment and modify).

plt.imshow()

The docstrings state the following which points to why interpolation is an issue and that it is related to dpi..

The number of pixels used to render an image is set by the Axes size and the dpi of the figure. This can lead to aliasing artifacts when the image is resampled because the displayed image size will usually not match the size of X (see Image antialiasing). The resampling can be controlled via the interpolation parameter and/or rcParams["image.interpolation"] (default: 'antialiased').

plt.imshow() has two options pertaining to interpolation. interpolation which takes a value from 'none', 'antialiased', 'nearest', 'bilinear', 'bicubic', 'spline16', 'spline36', 'hanning', 'hamming', 'hermite', 'kaiser', 'quadric', 'catrom', 'gaussian', 'bessel', 'mitchell', 'sinc', 'lanczos', 'blackman' and interpolation_stage which is when the interpolation is performed and there are two valid options data which is applied on the data provided and rgba which is applied after the colormap has been applied.

I have noticed whilst investigating #435 that enabling DEBUG can at times cascade down through to calls to Matplotlib routines and I saw a number of messages reporting PIL which is one of the image types that plt.imshow() can show, the other being array-like objects. Typically most of the workflow in TopoStats deals with np.ndarray as input/output and so perhaps at some stage in the plotting these arrays are converted to PIL (don't even know what that is right now).

plt.imsave()

plt.imsave() does not have any methods pertaining to interpolation. Does it therefore use the defaults or not do anything whatsoever?

plt.savefig()

plt.savefig() has no methods for controlling interpolation or dpi.

dpi

DPI (Dots Per Inch) also determines the resolution of the image. Currently these are not specified as the fig and axes classes are instantiated with plt.subplot(1, 1, figsize=(8,8)) (see L191 of save_figure() method.

They can be controlled at this stage though if required since plt.subplot() takes **kwargs for plt.figure() which includes not just figsize() but also the dpi.

Strategy

Derive small 20x20 arrays either from sample molecules or random and test each method of plt.imshow() / plt.imsave() / plt.savefig() along with varying values of dpi and figsize() to see what is happening. Ideally these should be tests within TopoStats test suite.

ns-rse commented 1 year ago

interpolation

Comparison of different size images 100 / 200 / 300 using plt.imshow() and all possible interpolation methods can be viewed here.

The lines, which are the basis of this issue, only appear when interpolation is none or nearest and it appears to be only when the the array is 200x200, I'm not convinced the lines are visible in 100x100 plots and they appear less prominent (subjectively) at 300x300.

interpolation_stage

As noted above there are two methods for selecting at what stage interpolation is undertaken controlled by interpolation_stage=[data|rgba].

The page here show a matrix comparing the two methods. There doesn't appear to be much difference between the two

Still most prominent in images based on arrays of 200x200 not visible in 100x100 and less prominent in 300x300.

figsize

Does varying the figsize impact this issue? All of the above are generated with figsize=(8,8) but does varying the size of the image impact on the artifacts when

The page here shows all interpolation methods for 100 / 200 / 300 size images but drawn and figsize=4x4 / figsize=8x8 / figsize=16x16.

Looking at none and nearest the larger 16x16 figures the banding appears to disappear be less pronounced.

Thoughts

Some random thoughts...

MaxGamill-Sheffield commented 1 year ago

Hey @ns-rse, of course! The code I used is below:

RNG = np.random.default_rng(seed=1000)
array = RNG.random((10, 10))
mask = RNG.uniform(low=0, high=1, size=array.shape) > 0.5

@pytest.mark.mpl_image_compare(baseline_dir="resources/img/")
def test_mask_cmap(plotting_config: dict, tmp_path: Path) -> None:
    """Test the plotting of a mask with a different colourmap (blu)."""
    plotting_config["mask_cmap"] = "blu"
    fig, _ = Images(
        data=array,
        output_dir=tmp_path,
        filename="colour.png",
        masked_array=mask,
        dpi=1000,
        **plotting_config,
    ).plot_and_save()
    return fig

The images are in fact zoomed screenshots but the larger images should be above

ns-rse commented 1 year ago

@ns-rse do we think increasing this value (although increasing test and image production times) is more suitable than remaking the rcParams file?

I'm thinking that aiming to produce publication quality images from every single scan with a single configuration file is perhaps optimistic.

Tweaking configuration parameters and re-running a script (which is in essence what run_topostats is) is not a very efficient method of honing/refining a plot.

A better approach is to use Jupyter Notebooks to load the NumPy array and derive the image that is required for publication that way.

519 introduces a sample Notebook that shows how to load and use such arrays. It doesn't yet include examples of how to change the DPI or the interpolation method but does show how to select a subset of an image and plot that. To my eyes plotting in this manner, even with the default interpolation, doesn't result in artifacts, but then I've only tested with minicircles.spm so far.

The Notebook can and should be expanded upon and I think it would be useful to convert the content (and that of other Notebooks) into Markdown for inclusion in a "Tutorials" section of the documentation.

alicepyne commented 1 year ago

Discussed today in the TopoStats meeting. Discussed that the "usual behaviour" should be to save with rulers - which currently creates interpolation issue. One suggestion is tomato pixels to inches. Working using a generic 512 pixels image size, and point this to a configurable image size? i.e. set fig size i.e. dpi as an option, based on those 512 pixels, but make it configurable by users? Need to balance "normal behaviour" - which could be smaller DPI with a configurable option of add imsave images which would be high quality with size and speed of plotting. Could assist users by adding in the details of the matplotlib size.g. in comment sin config file for configuring this.

SylviaWhittle commented 1 year ago

Not sure if I'm repeating information we already know, but here is what I have found this evening:

Copying the method used in save_figure from plottingfuncs.py into a notebook,

fig, ax = plt.subplots(1, 1, figsize=(8, 8))
im = ax.imshow(
    img,
    extent=(0, 300, 0, 300),
    interpolation='nearest',
    cmap=cmap,
    vmin=vmin,
    vmax=vmax,
)

Does produce the artefact: image

Switching the interpolation parameter to 'none', yields the same: image

However, using Python's None removes the artefact for me in the notebook: image

This even works at smaller scales (figsize=(3, 3)): image

I set the interpolation value directly in save_figure() in plottingfuncs.py to None (Not 'none' or 'None'), and it appears to work:

image

Close up:

image

I might be forgetting a detail here, but it seems that the solution is to use None directly in save_figure()?

ns-rse commented 1 year ago

Sorry, only just clocked this investigation. I'll check through #464 carefully and make sure the config change comes through as None. Thanks for checking this.