Add SCM Interpretations

roshankern commented 1 year ago

This PR is ready for review!

In this PR, interpretations are added for the single-class model (SCM) coefficients. In model_coefficient_correlations.ipynb, we compare the coefficients from the mutli-class and single-class models. The coefficients matrix from multi-class models are of shape (# phenotypic classes, # features), while the coefficients from single-class models are of shape (1, # features). Thus, we are able to compare the coefficient vectors for each phenotypic class per model.

We graph these coefficient vectors in a scatterplot where the coordinate pairs represent (mutli-class model coefficient value, single-class model coefficient value) for a particular feature. For each of the coefficient vectors for the multi-class and single-class mdoels, we derive the Pearson correlation coefficient with numpy.coercoef to get an idea of how correlated these vectors are. We also derive the Clustermatch Correlation Coefficient (CCC) introduced in Pividori et al, 2022. This is a not-only-linear coefficient based on machine learning models and gives an idea of how correlated the feature coefficients are (where 0 is no relationship and 1 is a perfect relationship). The correlations for each pair of coefficient vectors are displayed above their scatterplots.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

roshankern commented 1 year ago

The entire pipeline from 1.train_model onward has also been rerun in this PR to change the negative label names for single-cell models from "Not {Phenotypic Class}" to "{Phenotypic Class} Negative". The previous naming would result in different model class orders when the N in Not was alphabetically before the phenotypic class name (e.g. in Prometaphase).

This change has only affected notebooks and tsv files.

roshankern commented 1 year ago

Sorry to the reviewer for such a long PR :(

This PR actually has about 370 lines to review. Please scroll past all jupyter and tsv files to only review python, README, env, and sh files.

roshankern commented 1 year ago

Looks great - well done.

One comment I had was that it looks like this PR is touching multiple other files (for example, in 2.train_model and in 3.evaluation_model.) I'd like to confirm that this is intended.

Yep, this is intended. As I mentioned in a comment above (see comment for more info):

The entire pipeline from 1.train_model onward has also been rerun in this PR to change the negative label names for single-cell models from "Not {Phenotypic Class}" to "{Phenotypic Class} Negative".

This pipeline rerun modifies the jupyter notebook outputs, saved models, and intermediate TSV files for all modules as the labels change in each of these files.

WayScience / phenotypic_profiling

Add SCM Interpretations #30