codercahol / chlamy-ImPi

An image processing pipeline for time-series of Chlamydomonas reinhardtii fluorescence photos
Other
0 stars 0 forks source link

Date format #26

Closed samsongourevitch closed 5 months ago

samsongourevitch commented 5 months ago

The format of the date in the measurement_time seems to be inconsistent across the database. For instance, we have : 2023-06-11 11:08:13 at row 4994 which means the second item corresponds to the day because no experiments were made in June. And we also have : 2023-10-18 10:46:26 at row 769 which means the second item corresponds to the month. This is consistent with the fact that data_WT['measurement_time_0'].dt.month.unique() yields array([12, 1, 2, 3, 4, 5, 6]) which is not possible (it should give [10, 11, 12, 1, 2, 3]) The format seems to be consistent in the name of the files in the drive though (not completely sure).

codercahol commented 5 months ago

The error was due to a mismatch in the default settings of pd.to_datetime and the format of the Date column in the .csv's containing the date info.

Screenshot 2024-04-04 at 12 53 16 AM

to_datetime uses some auto-parser that assumed that the first number in Date was the month (following American convention), unless it was obviously not (ie it was >12). The solution was to explicitly specify the format of the date. The fix is in PR #28