YosefLab / PopV

MIT License
49 stars 10 forks source link

Possibility to extract most significant genes for a given prediction label #21

Closed PoGibas closed 1 year ago

PoGibas commented 1 year ago

This is a question (feature request) regarding parsing annotate_data result. popv.annotation.annotate_data provides assigned cell-type labels (e.g., res.obs.popv_majority_vote_prediction or for a single method res.obs.popv_svm_prediction). Is it possible to extract the most significant genes for a given label?

For example:

>>> res.obs.popv_majority_vote_prediction
AAACCCAAGCGCCTTG_TSP14_LI_Distal_10X_1_1                  CD4-positive, alpha-beta T cell
AAACGAAAGCCTTTCC_TSP14_LI_Proximal_10X_1_1                                    plasma cell
AAACGAAGTAGTAAGT_TSP14_LI_Proximal_10X_1_1    enterocyte of epithelium of large intestine

Is it possible to get the genes rated by their importance for a specific label (e.g., plasma cell)? I'm not sure if such information is even available within PopV or if I should look elsewhere for it.

canergen commented 1 year ago

My suggested way would be to run scanpy.tl.rank_genes_groups and use the respective column as grouping? Most classifiers (not the KNN and SCANVI) can report also intrinsic DE genes. Currently, this is not supported in PopV and we are waiting for a response of the OnClass authors to make their gene list accesible (as it contains unseen predictions it is in our perspective the most valuable one in addition of the final prediction where rank_genes_groups is the suggested method).