Open danielgsfb opened 2 years ago
This may be perceived as an optimization problem. 1) mean(original n = 55) and stdev (orginal n = 55) are known; 2) mean and stdev about (n = 54 discernable recordings + 1 new recording) may be computed, 3) you minimize the pairwise difference between the two by moving point 55 among the 54 already spot. Assuming the missing point is within the range of the 54 recordings, you start e.g., to put the 55th one at the lowest limit of that range of the 54 recording sorted in increasing order and compute mean/stdev for all 55. If the new mean value is less than the one originally reported for all 55, chances are that point number 55 is to be moved a little towards a higher value. You iterate as long as the differences for a) mean (reported) - mean (newly calculated) and b) stdev(reported) - stdev(newly calculated) are less or equal to thresholds you defined in advance.
However, caveat lector:
The prospect of performing this equally enters a twilight zone between «data reconstruction» and «data fabrication». Is your aim to use the 55 (or 54 clearly identified) recordings to determine the arithmetical mean value for an eventual comparison of two samples by the arithmetical mean value? Assuming the recordings follow a normal distribution (keyword qq-plot), the pairwise comparison of the confidence intervals for a Student t value (54 points) or t(55 points) at a confidence level of 95%, 99%, or 99.5% might yield differences no longer significant for the analysis ahead (A table, an interface to compute them with adjustable degrees of freedom and significance level.) To illustrate this, say you know the standard deviation σ (sigma) of the two repeated recordings (reference sample and unknown sample) and you determined in advance the difference between their arithmetical mean values (Δμ, Delta µ) which you need to identify. In this case, n is the number how often either reference or sample are measured, and N the sum how many times both (sample and reference) are measured:
I have a scatterplot that i'm trying to extract.
Suppose I found 54 values out of 55. There is one "missing" value, probably because it is overlapped and I can't actually see. I have the MEAN and Stand Dev of the 55 values.
Is there a way to reverse find the one that is missing? I mean, is there a function that would give me the value that, amongst the 54 values found, would fit and give me the MEAN and SD I have?
Complement: The ones missing are probably overlapped because the values plotted are duplicated. If there are just 1-2 missing dots, how many possible combinations of two duplicate values are there to reach the same mean? Even though they are not the actual values found by the original research investigators, they would be close enough and I would be able to validate the digitizing.
I feel like this should be a feature in the WebPlotDigitizer. Can you help me please?