Closed MaxGamill-Sheffield closed 1 year ago
The specific lines that use imshow
are in topostats/plottingfuncs.py
in the save_figure()
method (L194 and L205).
There is also a single use in tests/test_dnatracing.py
but this will be removed/changed as part of the DNA Tracing refactoring (#183).
The method should be refactored to create the images and then save them.
There are several options for this
DPI is mentioned as a possible confounder, in which case it may be sensible to introduce this as a configurable option when initialising the class and via configuration file, but with validation to ensure it is above a minimum threshold (what that threshold should be is yet to be determined).
Compared to theimsave
-produced image, it seems that the value null
/None
is the best for resolving this issue. However, the tests don't seem to reflect this.
Imsave:
null / None:
"none":
"nearest":
The docs say no data interpolation is done when "none" used, when None
the rcParams backend default is used (possibly antialiasing). However, we still see the lines even when "none" is used - might be a dpi issue?
"none" with dpi=1000 instead of "figure" also seems to resolve the issue. @ns-rse do we think increasing this value (although increasing test and image production times) is more suitable than remaking the rcParams file?
Starting some notes as I work through this.
As noted above plt.imsave() is a Matplotlib method albeit different to plt.imgshow() and plt.savefig(). And so specification of colorbar
and/or axes
doesn't induce Matplotlib images because they are all Matplotlib methods for producing/saving images.
But that doesn't change the current situation that we are observing interpolation that is unexpected.
The default rcParams appear to have the value of img.interpolation: antialiased
(typically in such configuration files the defaults are provided but commented out, users can the uncomment and modify, or copy the line uncomment and modify).
plt.imshow()
The docstrings state the following which points to why interpolation
is an issue and that it is related to dpi
..
The number of pixels used to render an image is set by the Axes size and the dpi of the figure. This can lead to aliasing artifacts when the image is resampled because the displayed image size will usually not match the size of X (see Image antialiasing). The resampling can be controlled via the interpolation parameter and/or rcParams["image.interpolation"] (default: 'antialiased').
plt.imshow()
has two options pertaining to interpolation. interpolation
which takes a value from 'none', 'antialiased', 'nearest', 'bilinear', 'bicubic', 'spline16', 'spline36', 'hanning', 'hamming', 'hermite', 'kaiser', 'quadric', 'catrom', 'gaussian', 'bessel', 'mitchell', 'sinc', 'lanczos', 'blackman' and interpolation_stage
which is when the interpolation is performed and there are two valid options data
which is applied on the data provided and rgba
which is applied after the colormap
has been applied.
I have noticed whilst investigating #435 that enabling DEBUG
can at times cascade down through to calls to Matplotlib routines and I saw a number of messages reporting PIL
which is one of the image types that plt.imshow()
can show, the other being array-like
objects. Typically most of the workflow in TopoStats deals with np.ndarray
as input/output and so perhaps at some stage in the plotting these arrays are converted to PIL
(don't even know what that is right now).
plt.imsave()
plt.imsave()
does not have any methods pertaining to interpolation. Does it therefore use the defaults or not do anything whatsoever?
plt.savefig()
plt.savefig()
has no methods for controlling interpolation or dpi.
DPI (Dots Per Inch) also determines the resolution of the image. Currently these are not specified as the fig
and axes
classes are instantiated with plt.subplot(1, 1, figsize=(8,8))
(see L191 of save_figure() method.
They can be controlled at this stage though if required since plt.subplot()
takes **kwargs
for plt.figure()
which includes not just figsize()
but also the dpi
.
Derive small 20x20 arrays either from sample molecules or random and test each method of plt.imshow()
/ plt.imsave()
/ plt.savefig()
along with varying values of dpi
and figsize()
to see what is happening. Ideally these should be tests within TopoStats test suite.
interpolation
Comparison of different size images 100
/ 200
/ 300
using plt.imshow()
and all possible interpolation methods can be viewed here.
The lines, which are the basis of this issue, only appear when interpolation
is none
or nearest
and it appears to be only when the the array is 200x200
, I'm not convinced the lines are visible in 100x100
plots and they appear less prominent (subjectively) at 300x300
.
interpolation_stage
As noted above there are two methods for selecting at what stage interpolation is undertaken controlled by interpolation_stage=[data|rgba]
.
The page here show a matrix comparing the two methods. There doesn't appear to be much difference between the two
Still most prominent in images based on arrays of 200x200
not visible in 100x100
and less prominent in 300x300
.
figsize
Does varying the figsize
impact this issue? All of the above are generated with figsize=(8,8)
but does varying the size of the image impact on the artifacts when
The page here shows all interpolation
methods for 100
/ 200
/ 300
size images but drawn and figsize=4x4
/ figsize=8x8
/ figsize=16x16
.
Looking at none
and nearest
the larger 16x16
figures the banding appears to disappear be less pronounced.
Some random thoughts...
dpi
it will likely interact with figsize
.figsize
impact things?Hey @ns-rse, of course! The code I used is below:
RNG = np.random.default_rng(seed=1000)
array = RNG.random((10, 10))
mask = RNG.uniform(low=0, high=1, size=array.shape) > 0.5
@pytest.mark.mpl_image_compare(baseline_dir="resources/img/")
def test_mask_cmap(plotting_config: dict, tmp_path: Path) -> None:
"""Test the plotting of a mask with a different colourmap (blu)."""
plotting_config["mask_cmap"] = "blu"
fig, _ = Images(
data=array,
output_dir=tmp_path,
filename="colour.png",
masked_array=mask,
dpi=1000,
**plotting_config,
).plot_and_save()
return fig
The images are in fact zoomed screenshots but the larger images should be above
@ns-rse do we think increasing this value (although increasing test and image production times) is more suitable than remaking the rcParams file?
I'm thinking that aiming to produce publication quality images from every single scan with a single configuration file is perhaps optimistic.
Tweaking configuration parameters and re-running a script (which is in essence what run_topostats
is) is not a very efficient method of honing/refining a plot.
A better approach is to use Jupyter Notebooks to load the NumPy array and derive the image that is required for publication that way.
minicircles.spm
so far.The Notebook can and should be expanded upon and I think it would be useful to convert the content (and that of other Notebooks) into Markdown for inclusion in a "Tutorials" section of the documentation.
Discussed today in the TopoStats meeting. Discussed that the "usual behaviour" should be to save with rulers - which currently creates interpolation issue. One suggestion is tomato pixels to inches. Working using a generic 512 pixels image size, and point this to a configurable image size? i.e. set fig size i.e. dpi as an option, based on those 512 pixels, but make it configurable by users? Need to balance "normal behaviour" - which could be smaller DPI with a configurable option of add imsave images which would be high quality with size and speed of plotting. Could assist users by adding in the details of the matplotlib size.g. in comment sin config file for configuring this.
Not sure if I'm repeating information we already know, but here is what I have found this evening:
Copying the method used in save_figure
from plottingfuncs.py
into a notebook,
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
im = ax.imshow(
img,
extent=(0, 300, 0, 300),
interpolation='nearest',
cmap=cmap,
vmin=vmin,
vmax=vmax,
)
Does produce the artefact:
Switching the interpolation
parameter to 'none'
, yields the same:
However, using Python's None
removes the artefact for me in the notebook:
This even works at smaller scales (figsize=(3, 3)
):
I set the interpolation
value directly in save_figure()
in plottingfuncs.py
to None
(Not 'none'
or 'None'
), and it appears to work:
Close up:
I might be forgetting a detail here, but it seems that the solution is to use None
directly in save_figure()
?
Sorry, only just clocked this investigation. I'll check through #464 carefully and make sure the config change comes through as None
. Thanks for checking this.
Describe the bug Horizontal and vertical lines are seen in the matplotlib.imshow images when they are saved.
MPL image:
vs using
plt.imsave
to save the image directly.To Reproduce Run high resolution image through TopoStats with either, or both, the
colorbar
andaxes
selected to induce a matplotlib plot. See that on these images, the data is interpolated and so show horizontal and vertical lines.Expected behavior Image should be clear and un-interpolated.
Output Screenshots above.
Additional context This could be down to the "nearest"-neighbour interpolation or the DPI setting.