con / opfvta-reexecution

Container-based Replication of https://doi.org/10.1038/s41398-022-01812-5
Apache License 2.0
1 stars 1 forks source link

One of the subfigures is identical, which is suspicious. #42

Closed TheChymera closed 11 months ago

TheChymera commented 12 months ago

decohost_2023-10-03T12:10:10

Fig 4e on the 14th page of the manuscript appears static, though it is not.

No idea why this happens, since the bitmap isn't stored in the OPFVTA repo. It's unlikely the distributions are identical, not least of all because they are sourced from the same volumetric data as the corresponding left-hand figure (data/features_l2/_tstat.nii.gz).

I have been digging into this for some time now, hoping to figure it out. Haven't given up quite yet, but if anybody would like to chime in...

yarikoptic commented 12 months ago

I wonder more why there are such huge differences in histograms of (d) -- how non-deterministic it is? I thought we looked into fixing a seed somewhere. Since I am not familiar with algorithm -- are both d) and e) are coming from the same pipeline or they are different analytics?

TheChymera commented 12 months ago

@yarikoptic no, the seed fixing was part of the templates package. Registration is pretty non-deterministic, which is why I'm more concerned about lack of differences in such things as histograms than presence of differences — particularly here where the comparison is part of a higher-level analysis comparing the signal to a different atlas-based signal. In any case, fixing a seed would not do much with respect to comparison with the original manuscript, since that did not have a fixed seed. It would just make the diffs between recent reexecutions smaller, which is a comparison we don't do, since we focus on reexecution. I don't think we should extend the article to that. We're already covering a breadth of topics and it will just be too scatter-shot.

The two figures are basically the same script pointing at the same data file, with different parameters:

[deco]~/docsrc/opfvta ❱ diff scripts/distributions_features_filtered_*.py
11,13c11,14
< df['Structure'] = df['Structure'].replace({
<         'Accessory olfactory bulb: glomerular, external plexiform and mitral cell layer': 'Accessory olfactory bulb: glomerular, external plexiform, mitral layers',
<         })
---
> #print(df.loc[df['Structure'] == 'Accessory olfactory bulb: glomerular, external plexiform and mitral cell layer'])
> #print(df.loc[df['Structure'] == 'Accessory olfactory bulb: glomerular, external plexiform and mitral cell layer'])
> #df = df.replace('Accessory olfactory bulb: glomerular, external plexiform and mitral cell layer','Accessory olfactory bulb: glomerular, external plexiform, mitral layers')
>
29d29
<         ascending=True,
32c32
<         xlim=[-5,2.7],
---
>         xlim=[-2.5,6.2],
34c34
<         cmap='summer_r',
---
>         cmap='spring_r',
37d36
<         text_side='right',

It is not only possible but expected that the negative-most histograms have more noise, because they have more signal. It is just very odd that the positive-most ones should be identical when the former are not. There are a few more tests I can run, just double-checking the process again...

TheChymera commented 11 months ago

Ok, so the issue is that grayscale diff-pdf correctly highlights differences, while color-enabled diff-pdf does not. The statistical summaries of differing pixels are correct in both cases.

decohost_2023-10-06T09:53:21

Here is a MWE → https://resources.chymera.eu/debug/debug_diff-pdf.tar.xz

Since the original software has no issue tracker I am pinging the author here @vslavik

TheChymera commented 11 months ago

@vslavik we closed the issue since the mitigation is acceptable for the time being, but please let us know if you're interested in addressing this. We might be able to help, as your package is very useful for us.

asmacdo commented 11 months ago

Ok so that sounds very annoying to debug, nice catch again

vslavik commented 11 months ago

Since the original software has no issue tracker I am pinging the author here @vslavik

The original software clearly says why it doesn't have it — so that you contribute instead of asking for things. Side-stepping that by personally nagging feels a bit disrespectful of that...

but please let us know if you're interested in addressing this

If you wish to improve diff-pdf, that's greatly appreciated: please make the changes and submit a pull request, I'll gladly merge it or help you out with finishing it.

yarikoptic commented 11 months ago

Nice digging @TheChymera !

asmacdo commented 11 months ago

@vslavik sorry we missed the issue tracker note, we are all just bug magnets around here with issue tracker reflexes ;)

Fortunately it wasn't a diff-pdf bug in the end, and your tool has been very useful to this project