Visualization - Githubissues

For the LCO split, we should add an analysis (R^2 and scatterplot) where we normalize the target by the ground truth drug mean. The remaining variance in the data is then the feature-target associations + noise and not the variance from the different drug means. Thats one way to avoid Simpsons paradox during the analysis.

For the LDO split we can do the same with the cell-line means.

Visualization TODOs:

[x] Violin plots of how performance measures vary over the different CV runs
- [x] Normal
- [x] Normalized w.r.t. drug mean
- [x] Normalized w.r.t. cell line mean
[x] Add standard errors
[x] Heatmap of performance measures for every individual run
- [x] Normal
- [x] Normalized w.r.t. drug mean
- [x] Normalized w.r.t. cell line mean
[x] Scatterplot: comparison of correlation per drug/cell line for two models like in https://academic.oup.com/bioinformatics/article/38/14/3609/6604271 , Fig 2
[x] Scatterplot y_true vs. y_pred per drug and per cell line
- [x] Normal: Per Drug
- [x] Normal: Per cell line
- [x] Normalized w.r.t. drug mean -> per drug
- [x] Normalized w.r.t. cell line mean -> per cell line

daisybio / drevalpy

Visualization #7