Closed roshankern closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
The entire pipeline from 1.train_model
onward has also been rerun in this PR to change the negative label names for single-cell models from "Not {Phenotypic Class}" to "{Phenotypic Class} Negative". The previous naming would result in different model class orders when the N
in Not
was alphabetically before the phenotypic class name (e.g. in Prometaphase).
This change has only affected notebooks and tsv files.
Sorry to the reviewer for such a long PR :(
This PR actually has about 370 lines to review. Please scroll past all jupyter and tsv files to only review python, README, env, and sh files.
Looks great - well done.
One comment I had was that it looks like this PR is touching multiple other files (for example, in
2.train_model
and in3.evaluation_model
.) I'd like to confirm that this is intended.
Yep, this is intended. As I mentioned in a comment above (see comment for more info):
The entire pipeline from 1.train_model onward has also been rerun in this PR to change the negative label names for single-cell models from "Not {Phenotypic Class}" to "{Phenotypic Class} Negative".
This pipeline rerun modifies the jupyter notebook outputs, saved models, and intermediate TSV files for all modules as the labels change in each of these files.
This PR is ready for review!
In this PR, interpretations are added for the single-class model (SCM) coefficients. In model_coefficient_correlations.ipynb, we compare the coefficients from the mutli-class and single-class models. The coefficients matrix from multi-class models are of shape
(# phenotypic classes, # features)
, while the coefficients from single-class models are of shape(1, # features)
. Thus, we are able to compare the coefficient vectors for each phenotypic class per model.We graph these coefficient vectors in a scatterplot where the coordinate pairs represent
(mutli-class model coefficient value, single-class model coefficient value)
for a particular feature. For each of the coefficient vectors for the multi-class and single-class mdoels, we derive the Pearson correlation coefficient with numpy.coercoef to get an idea of how correlated these vectors are. We also derive the Clustermatch Correlation Coefficient (CCC) introduced in Pividori et al, 2022. This is a not-only-linear coefficient based on machine learning models and gives an idea of how correlated the feature coefficients are (where 0 is no relationship and 1 is a perfect relationship). The correlations for each pair of coefficient vectors are displayed above their scatterplots.