@dkopasker I'd like you to describe here what you expect from every variable in the raw data files, that includes their range, possible NA or NaN, etc. In addition, we need to clearly state how the code processes such values. Common options include dropping such entries, asking the aggregate functions to ignore them, or replacing with some imputed values (mean of some sort, median).
This approach should make the data analysis much more reproducible.
We should also consider LABsim output as potentially corrupted as the code itself is not tested properly. Constant changes in the code do not help here either. That means this script must notify every user in the case any input value is out of expected range.
This code needs some data validation.
@dkopasker I'd like you to describe here what you expect from every variable in the raw data files, that includes their range, possible
NA
orNaN
, etc. In addition, we need to clearly state how the code processes such values. Common options include dropping such entries, asking the aggregate functions to ignore them, or replacing with some imputed values (mean of some sort, median). This approach should make the data analysis much more reproducible.We should also consider LABsim output as potentially corrupted as the code itself is not tested properly. Constant changes in the code do not help here either. That means this script must notify every user in the case any input value is out of expected range.