codercahol / chlamy-ImPi

An image processing pipeline for time-series of Chlamydomonas reinhardtii fluorescence photos
Other
0 stars 0 forks source link

Duplicated rows in plate 99 #27

Closed samsongourevitch closed 5 months ago

samsongourevitch commented 5 months ago

I think some rows are duplicated in the final database I found in the drive. For instance, row 2 and 3 are the same, the only difference is the 'threshold' column. This is the case for all the 1min-1min experiments for plate 99. I believe this only happens for plate 99 and for the 1min-1min condition but I am not completely sure.

codercahol commented 5 months ago

I suspect this is because there are two thresholds being calculated, one for the photos taken in the light and one for photos taken in the dark. When the dataframes are merged, the rows are (erroneously) replicated because the proper series data wasn't initialized for adding to the dataframe

I won't be able to get to this till next week, but the duplicates can simply be discarded -- the fluorescence intensities are calculated from the light and dark thresholds applied together over all of the frames anyways; meaning, this is merely a cosmetic bug in the way the data is recorded, and should not meaningfully affect the actual computation of y2 values

codercahol commented 5 months ago

There are two replicated measurements of the 99-M1 plate in the source files from the camera: '/carnegie/data/Shared/Labs/burlacot/Fluctuation Screen TIFF and XPIM/20231201 99-M1 1min-1min.csv' and '/carnegie/data/Shared/Labs/burlacot/Fluctuation Screen TIFF and XPIM/20231012_99-M1_1min-1min.csv'

it is weird that all of the y2 measurements are identical. That might be because of some funiness with merging dataframes on shared keys. Regardless, I believe @Ablencourt said we should only keep the more recent replicate because the earlier measurements were taken while the experimental procedure was being ironed out

codercahol commented 5 months ago

the solution is simply to remove the extra file from the data folder on the cluster

samsongourevitch commented 5 months ago

It is normal that we have two replicated measurements of the 99-M1 plate, it has been run twice. I don't think this is the issue and we should (at least in the beginning) keep both. I think your first comment was relevant because in addition to being two measurements (which is normal) we have for each measurement, duplicated rows. For instance, we have 384 times 5 1min-1min rows when we should have at most 384 times 2. I believe until it is fixed, the issue should be reopened (it seems I don't have the permission to do it)