WayScience / phenotypic_profiling

Machine learning for predicting 15 single-cell phenotypes from cell morphology profiles
Creative Commons Attribution 4.0 International
1 stars 3 forks source link

Create validation module #7

Closed roshankern closed 1 year ago

roshankern commented 1 year ago

This PR is ready for review! In this PR, the final model is validated using the cell health dataset.

After using the final model to derive phenotypic class probabilities for the classify cell health data features, these probabilities are averaged across perturbation and cell line to create create 357 classifiction profiles (119 CRISPR guides x 3 cell lines).

As part of Predicting cell health phenotypes using image-based morphology profiling, Way et al derived cell health indicators, and averaged these indicators across across CRISPR guide/cell line to create 357 Cell Health label profiles.

We use pandas.DataFrame.corr to find the Pearson correlation coefficient between the classifiction profiles and the Cell Health label profiles. A Pearson correlation closer to -1/+1 shows a stronger inverse/direct relationship.

These correlations can be used to validate the model's success in classifying the cell health data. For example, the vb_percent_all_apoptosis indicator has a relatively high Pearson correlation with the model's apoptosis probabilities, implying a relatively strong direct linear relationship.

In future PRs, we hope to apply this process to the shuffled baseline model to provide a baseline for the final model's correlations. Also, we hope to add other methods of validation, such as applying the model to other datasets.