Scan-o-Matic / scanomatic-standalone

GNU General Public License v3.0
0 stars 0 forks source link

Validate results are "same" as old projects #145

Closed e-larsson closed 2 years ago

e-larsson commented 2 years ago

πŸ“‹ Info

The results produced by the system does not seem to be coherent with the old projects, here is a summary of a previous investigation for the PayamMartin19Oct16Scanner2 project, it includes the compilation/analysis/feature extraction steps but NOT scanning:

2 - Compile Project

With local fixture

Produces

Differences

*.project.compilation

Errors

2022-01-28 08:21:55 -- ERROR **Compile Effector** Could not output analysis to file /somprojects/PayamMartin19Oct16Scanner2/PayamMartin19Oct16Scanner2_0000_30.2968.tiff

3 - Analysis

With dynamic positioning

Produces

Differences

grid_plate___*.npy

Same shape but slightly different values by single digits

image_*_data.npy

time_data.npy

4 - Feature extraction

Run with overwriting QC

Produces

Differences

curves_raw.npy and curves_smooth.npy

phenotypes.Absolute.plate_*.csv

phenotypes_filter.npy

phenotypes_raw.npy

phenotype_vectors_raw.npy

phenotype_times.npy

🏁 DoD

e-larsson commented 2 years ago

Some previous discussion can be found in https://github.com/Scan-o-Matic/scanomatic-standalone/pull/144

local-minimum commented 2 years ago

Consider checking the values of the grayscale in fixture vs project.compilation file in original data and then check the values in the fixture vs the values in a newly compiled project.compilation.

BengtRydberg commented 2 years ago

Below is a figure showing a comparison between new and old grayscale values for PayamMartin19Oct16Scanner2 project. Differences are found to be within about +- 0.13. Is this an issue or not?

scanomatic_grayscale_values

local-minimum commented 2 years ago

It looks quite good to me, if you want to check if a bit further each array of values is used to fit a polynomial (don't remember exactly but might be a 3rd deg poly) and it's the differences in values for the range [0, 255] that really matters.

skymandr commented 2 years ago

I don't know if this is an issue, but I feel fairly confident in saying that this can't explain the difference we are seeing.

local-minimum commented 2 years ago

me too lets say it's something else that causes the differences then... How does heatmaps of differences between the raw growthcurve values look?

BengtRydberg commented 2 years ago

me too lets say it's something else that causes the differences then... How does heatmaps of differences between the raw growthcurve values look?

What file(s) contain this information?

skymandr commented 2 years ago

me too lets say it's something else that causes the differences then... How does heatmaps of differences between the raw growthcurve values look?

@e-larsson looked a bit at this. The shapes of the curves are basically the same, but the values are different by a factor of approximately two (but the difference was, as I recall, time dependent). The difference is similar to what you get when comparing e.g. analysis with analysis_original and analysis_no_median. @local-minimum: Do you know what the difference is between these different analyses?

local-minimum commented 2 years ago

no_median probably refers to skipping median filter in curve smoothing, but I doubt that's a primary reason for big changes and if it's as I expect it would only be causing difference on the smooth curves and phenotypes, not on the raw curves.

skymandr commented 2 years ago

Our current line of inquiry is to make a new compilation and analysis on a project in the 2.X-branch, and compare that with results from the 3.X branch, to make sure we are comparing the same things.

local-minimum commented 2 years ago

Makes sense, if someone has time to look at other project we have it might be worth just trying one named something completely different and see if numbers agree better in that one.

BengtRydberg commented 2 years ago

Below is a comparison between the image 213 in the analysis for PayamMartin19Oct16Scanner2. I have compared three images:

and the percentage difference is reported as 200 * (x - y) / (x + y).

From the figures we see that both data from python2.7 and python3.9 deviates quite significantly from payams original analysis. But it seems that pixel values in the images from python2.7 and python3.6 agrees closely. There are a few pixels that deviates by ~25 % for the upper right plate. However, from the bottom figure showing a histogram of the differences we see that most pixels agree within 0.5 % even for this plate.

So I guess this is good news for us...

scanomatic_1

scanomatic_2

scanomatic_3

scanomatic_4

local-minimum commented 2 years ago

I agree that it looks like good news for us!

It is weird that the comparison with payam shows a spacial bias, which makes me wonder if that data had been corrected for that bias in some manner.

The differences between 2.7 och 3.9 look benign. Since it is a non-trivial and non-deterministic process it's to be expected that there be some variation. And this is also part of the reason why only ever relative experiments compared to some reference experiment is done.

BengtRydberg commented 2 years ago

Should we repeat Erik's comparison above or are we satisfied?

local-minimum commented 2 years ago

It would be nice to also see that the phenotypes and normalized phenotypes (check growth rate) are comparable between 2.7 and 3.9 if it's not too much work.

BengtRydberg commented 2 years ago

Had a bit problem to understand how to read some of the files. Below I have compared Phenotypes.ExperimentGrowthYield from phenotypes.Absolute.plate_X.csv files and for python2.7 and python3.9. The top two figures show absolute values and the bottom two the relative difference. Differences seems to be quite close to 0 to me.

scanomatic_5

scanomatic_6

scanomatic_7

scanomatic_8

BengtRydberg commented 2 years ago

Same type of figures but for Phenotypes.GenerationTime

scanomatic_11

scanomatic_12

scanomatic_14

scanomatic_13

local-minimum commented 2 years ago

It's a bit disturbing that there's a positional bias in the comparison between python 2.7 and 3.9. I don't quite understand what might be causing that. That there might be a very minute general shift independent of position on the lower plates seems not troublesome but the top two plates are scary.

skymandr commented 2 years ago

It's a bit disturbing that there's a positional bias in the comparison between python 2.7 and 3.9. I don't quite understand what might be causing that. That there might be a very minute general shift independent of position on the lower plates seems not troublesome but the top two plates are scary.

I think it is only the top left plate that is scary. I don't see the same spatial tendency in the top-right one, though the missing data in that plate makes it difficult to say for sure. I would suggest that a good (but boring) starting point is straight up code inspection: look at the GenerationTime phenotyper code in 2.7 and 3.8 side by side to rule out obvious things, such as int vs. float division.

local-minimum commented 2 years ago

I didn't realize it was missing, but why is it missing with a spatial bias then?

At any rate this must be investigated before we can consider this task done.

skymandr commented 2 years ago

I didn't realize it was missing, but why is it missing with a spatial bias then?

I don't know, but it's the same in the analyses we have from before, if I recall correctly. It is very evident in QC, when looking at the affected plate.

joakimmoller commented 2 years ago

Next agreed step is to narrow down were the differences starts to occur. "Start process in 2.7 continue in in python3"...

I.e. run analysis in scanomatic 2.2 export the data and continue with feature extraction in scanomatic 3. By altering switching points we can probably nail down were it starts to deviate.

skymandr commented 2 years ago

Some more preliminary results, where I've tried to replicate the spatial bias seen for the Generation Time phenotype (GT) above. image

As can be seen, I cannot reproduce the problem, though the growth yields seem the same as before, as do the ranges of differences in GT between the two versions. It should be noted that I am not 100 % sure the settings in 2 and 3 are exactly the same, but if they are not, then that in it self is an indicator of robustness, rather than the opposite. Still, caution is advised.

A possible explanation can be found in the grayscale differences further up: inspecting the positions that show the spatial bias in QC, I could see that their GT is calculated in roughly the interval where the grayscale is jumpy, whereas the positions lacking bias have their growth phase later. This could explain how the bias came to be.

What is unknown is why the grayscale was jumpy previously but not now, but that the grayscale algorithm can be jumpy is a known problem, so maybe just bad luck? Another explanation could be different fixtures used for the 2 and 3 analyses in the previous investigation, but that is speculation.

local-minimum commented 2 years ago

Sounds good, could you also just make a comparison of the normalized phenotypes and then we might call it a day and say validation is done.

skymandr commented 2 years ago

Results from the more careful study are in agreement, though the normalized phenotypes show some things that are concerning. No spatial bias as far as I can tell, but quite large deviations.

Raw Yield:

image

Raw Growth Time:

image

Normalized Yield:

(scale clipped to [-100, 100] in comparison)

image

Normalized Growth Time:

(scale clipped to [-100, 100] in comparison)

image

local-minimum commented 2 years ago

That looks scary

local-minimum commented 2 years ago

Normalized Yield:

(scale clipped to [-100, 100] in comparison)

[ image removed to avoid confusion ]

What is the figure 3 here, it truly looks like something went wrong with the clear positional frame.

skymandr commented 2 years ago

Normalized Yield:

(scale clipped to [-100, 100] in comparison)

[ image removed to avoid confusion ]

What is the figure 3 here, it truly looks like something went wrong with the clear positional frame.

That is the normalized yield from SoM 2. I have updated the title in the figure. (For reference, it looks the same in SoM 3 with the patterns.)

~Might we have used the wrong colony as reference or something..? We used the default (lower right, though the images above use bottom left as origin, so it looks like upper right).~ EDIT: Seems to make no difference.

skymandr commented 2 years ago

This is what QC looks like for normalized Yield:

SoM 2

image

SoM 3

image

... and disregarding the weird patterns for a moment, this is what the absolute difference between 2 and 3 looks like: image

image

which doesn't look near as bad at the percentage comparisons, but I'm not sure which is the most relevant, but this at least shows that the sometimes huge relative differences occur where the actual values are very small.

local-minimum commented 2 years ago

True, percentage isn't as reasonable here. One would realistically take the median/mean reference position value and add it back to the normalized values before doing a percent comparison. And I don't see anything truly troubling here when just subtracting. I think we can say that remaining differences are so minor that they are probably not caused by any serious and new bug introduced. Rather that we are seeing the limits of SoM in its current implementation and since it's not deterministic it is expected.

I'm happy with these results if you are happy @skymandr

skymandr commented 2 years ago

I'm happy with these results if you are happy @skymandr

I'm happy, but do we need an issue for the weird patterns in yield after normalization?

skymandr commented 2 years ago

The spatial normalization problem is documented in issue #164.