Open JGarciaCondado opened 7 months ago
We should also allow when naming multiple systems that when we have missing data for one subject for a system but not for another system we should only remove the subject when calculating the age model of that specific system.
We have also found a new bug/problem. If you upload a .csv with an index that is not numeric an error is thrown. We should test and fix so that files that have a first column named subject with values sub001, sub002, sub003, ... work. Otherwise we should specify that files should have a column called ID (this will avoid less problems and in loading .csv ID column should be made the index). However, we should still ensure that the indices can be random numbers or alphanumeric values.
When looking at at clinical factors we should not be removing all the subjects that have NaN in a factor. This is because in many studies some subjects have some tests and others others. We are therefore reducing drastically the number of subjects. I would go for an approach where we report the number of subjects used in each factor but keep as many as possible. Imputation here would not be a good strategy.
The software package is dealing currently with tabular data only. However, there is one important aspect that has not been dealt with categorical variables.
To improve this:
Another aspect of data handling is data imputation. Currently, any subject with missing data in any of the files submitted is discarded. However, some basic imputation strategies could be implemented.