automeris-io / WebPlotDigitizer

Computer vision assisted tool to extract numerical data from plot images.
https://automeris.io
GNU Affero General Public License v3.0
2.67k stars 363 forks source link

How to 'reverse' derive correct values when given MEAN + SD #286

Open danielgsfb opened 2 years ago

danielgsfb commented 2 years ago

I have a scatterplot that i'm trying to extract.

Suppose I found 54 values out of 55. There is one "missing" value, probably because it is overlapped and I can't actually see. I have the MEAN and Stand Dev of the 55 values.

Is there a way to reverse find the one that is missing? I mean, is there a function that would give me the value that, amongst the 54 values found, would fit and give me the MEAN and SD I have?

Complement: The ones missing are probably overlapped because the values plotted are duplicated. If there are just 1-2 missing dots, how many possible combinations of two duplicate values are there to reach the same mean? Even though they are not the actual values found by the original research investigators, they would be close enough and I would be able to validate the digitizing.

I feel like this should be a feature in the WebPlotDigitizer. Can you help me please?

nbehrnd commented 2 years ago

This may be perceived as an optimization problem. 1) mean(original n = 55) and stdev (orginal n = 55) are known; 2) mean and stdev about (n = 54 discernable recordings + 1 new recording) may be computed, 3) you minimize the pairwise difference between the two by moving point 55 among the 54 already spot. Assuming the missing point is within the range of the 54 recordings, you start e.g., to put the 55th one at the lowest limit of that range of the 54 recording sorted in increasing order and compute mean/stdev for all 55. If the new mean value is less than the one originally reported for all 55, chances are that point number 55 is to be moved a little towards a higher value. You iterate as long as the differences for a) mean (reported) - mean (newly calculated) and b) stdev(reported) - stdev(newly calculated) are less or equal to thresholds you defined in advance.

However, caveat lector: