gitter-lab / pharmaco-image

MIT License
1 stars 0 forks source link

Project summary #14

Open xiaohk opened 5 years ago

xiaohk commented 5 years ago

Here is a list of our experiments and findings:






Assays F1 Accuracy Average Precision AUC Precision Recall
RF with fingerprint feature 209 13.18% 90.72% 31.74% 71.22% 30.44% 11.43%
LR with CNN feature (before normalization) 212-4 34.87% 84.48% 33.12% 85.22% 29.77% 75.17%
LR with CNN feature (after normalization) 209 17.02% 62.64% 15.54% 56.80% 14.59% 46.73%
RF with CNN feature (after normalization) 210 8.44% 90.49% 22.11% 70.69% 22.38% 8.81%
LR with CellProfiler mean-well feature 206 24.74% 81.07% 24.54% 69.52% 20.83% 42.27%.
agitter commented 5 years ago

I'm adding some initial thoughts, and we can discuss this more in person.

A lot of the effort was spent exploring batch effects. We did not find too much explicit guidance about how to detect and correct for batch effects in this type of data. A paper that highlighted the batch effects and offered analyses showing pros and cons of different correct strategies may not be the most exciting, but it would be valuable. We would need to do more thorough literature searching to make sure there is not already highly similar related work.

All of the work to combine ExCAPE with the Cell Painting dataset is also valuable. Even without the downstream predictive modeling, we could write about how to align these two datasets and make the resource available. That would be highly derivative of the existing datasets though, even though it is non-trivial to combine them.

Lastly, we have the assay activity prediction work. This story would be more complete if the LeNet or VGG CNNs worked. It is counter intuitive that the LR with CNN features is worse after normalization. It might take a lot of work and compute time to do final runs for rigorous comparisons.