For the LCO split, we should add an analysis (R^2 and scatterplot) where we normalize the target by the ground truth drug mean. The remaining variance in the data is then the feature-target associations + noise and not the variance from the different drug means. Thats one way to avoid Simpsons paradox during the analysis.
For the LDO split we can do the same with the cell-line means.
Visualization TODOs:
[x] Violin plots of how performance measures vary over the different CV runs
[x] Normal
[x] Normalized w.r.t. drug mean
[x] Normalized w.r.t. cell line mean
[x] Add standard errors
[x] Heatmap of performance measures for every individual run
For the LCO split, we should add an analysis (R^2 and scatterplot) where we normalize the target by the ground truth drug mean. The remaining variance in the data is then the feature-target associations + noise and not the variance from the different drug means. Thats one way to avoid Simpsons paradox during the analysis.
For the LDO split we can do the same with the cell-line means.
Visualization TODOs: