Variance/Standard Deviation Update Issues

kloppstock commented 4 years ago

During the calibration phase of the Jungfrau data set, the variance estimation for many pixels quickly becomes negative after the moving window is filled. We propose using a regular variance calculation (without the moving window) since it is only updated during the calibration process. We would leave the moving average estimation as it is. Would this work for you?

sredford commented 4 years ago

Hi Jonas,

I think it's not a problem, since as you said this is only updated during the initial dark phase. However, I'm surprised that it goes negative.

I ran a test using our MovingStat.h file, see the three plots attached. I simulate a drifting pedestal (grey line) and generate 'data' (black dots) using a Gaussian with mean of the grey line and sigma 10. Then I calculate the pedestal (red line) with a rolling window 100 frames wide. The variance and standard deviation are positive over 1000 frames.

How many frames does it take for your variance to go negative? Could it be that you don't update the M**2 with the moving window?

Cheers, Sophie

drift_9

drift_rms_9

drift_var_9

kloppstock commented 4 years ago

Hello Sophie,

thank you for your quick reply! We also used a rolling window of 100 and the first negative value appeared in frame 112. Our implementation should be very similar to yours (line 67 was changed to m2 += adc * adc;).

Please find the results attached below.

Cheers, Jonas

Mean: mean

Standard Deviation: stddev

Variance:

M:

M (zoomed in): m_zoom

M2:

M2 (zoomed in): m2_zoom

sredford commented 4 years ago

Hi Jonas,

Thanks for adding the comparable plots. I find the change in behaviour at 100 frames (the size of your rolling window) abit suspicious, so I think it's worth having a look at the code.

I haven't run your implementation, but just reading through I spotted: line 67: should m2 += adc; actually be m2 += adc * adc; ?

I don't think this will solve all the problems, because it will only affect the first 100 frames, but can you alter that and give it a try?

Cheers, Sophie

kloppstock commented 4 years ago

Hi Sophie,

we also noticed this small bug on line 67 and have already fixed it, so the plots above already use this corrected version. I have now also uploaded the fix to line 67.

Cheers, Jonas

sredford commented 4 years ago

Hi Jonas, Ok, sorry, I see your earlier comment now. Can you link me your updated implementation please? The previous link is still the old one. Cheers, Sophie

kloppstock commented 4 years ago

Hi Sophie,

the updated file is located here: include/jungfrau-photoncounter/kernel/helpers.hpp (which is part of the master branch).

Cheers, Jonas

sredford commented 4 years ago

Hi Jonas,

I took your algorithm (from line 58 to 78, including the fix to line 67) and applied it to my simulated data. The results are the same as I get from using MovingStat.h.

Are you using this same algorithm for all frames, 0 to 100 and 100+? When do you start the pedestal tracking?

Cheers, Sophie

drift_9

drift_var_9

kloppstock commented 4 years ago

Hi Sophie,

we are using the algorithm for all frames.

Could you try your algorithm with the attached data? This data is from pixel (0, 0) of the pedestal calibration file of the Jungfrau data set (allpede_250us_1243__B_000000.dat) during the first 300 frames.

Cheers, Jonas

adc.txt

sredford commented 4 years ago

Hi Jonas,

Using the 300 frames that you sent, I see that the data points and means all agree, but the variances that I calculate (your implementation and from MovingStat.h) agree and stay positive, while your variance plot goes negative. I'm not sure what's going on.

Cheers, Sophie

drift_9

drift_var_9

heidemeissner commented 4 years ago

Hi Jonas, hi Sophie, Could it be a matter of data type (windowsize and currentwindowsize are maybe different in type)? After the 100th frame, the values for mean and variance you sent yesterday have less digits. Maybe this plays a role in the division of m2 and s.th goes wrong there. The algorithms themselves look equivalent... Cheers, Heide

kloppstock commented 4 years ago

Hi Heide,

in theory, only currentWindowSize is used for the computations. Since this variable is a double in our code it should have a very high precision. The variable for the standard deviation is a double as well. Both m and m2 are unsigned 64 bit integers, so an overflow is highly unlikely.

I will try to reproduce my problem in a spreadsheet and maybe I will find my bug there.

Cheers, Jonas

sredford commented 4 years ago

Dear Heide,

You're right about the different types, but nothing changes on my side (c++) when I make everything double.

The only thing I can imagine is that after 100 frames an update or tracking algorithm runs that filters on incoming adc (eg abs(adc - pedestal) < rms) before the adc is added to the pedestal. This would reduce the variance, but not as dramatically as Jonas shows.

Cheers, Sophie

drift_var_9

kloppstock commented 4 years ago

Hi Sophie, hi Heide,

Heide is probably right with her suggestion on data types. I could recreate the correct graphs using LibreOffice and compared the results. I could observe a ever increasing drift in the m and m2 values and noticed that these are stored as integers. This results in completely correct values during the the first 100 frames but loses a significant portion of precision when the operations on line 70 and 71 are executed. Since the compiler always rounds down when converting to the nearest integer, the values start to drift down quite rapidly.

I will implement a fix and report back with results during this or the next week.

Thanks you both for your valuable insights, this really helped a lot!

Cheers, Jonas

sredford commented 4 years ago

Hi Jonas, Great news! Happy that you have a good lead to work with now. Thanks for the nice collaboration, Sophie

kloppstock commented 4 years ago

Hi,

I tested the code again with the fixes today and the problem is resolved. Thanks again for your help!

Cheers, Jonas

ComputationalRadiationPhysics / jungfrau-photoncounter

Variance/Standard Deviation Update Issues #64