YosefLab / PopV

MIT License
50 stars 10 forks source link

compare gene overlaps in .var of reference and query in ENSGID annotations #26

Closed annashch-insitro closed 1 year ago

annashch-insitro commented 1 year ago

For the Tabula Sapien tutorial, it looks like gene overlap between reference and query is computed with Hugo gene symbols. I tried the tutorial on my own query data, and there was mismatch in a handful of genes that had different Hugo symbols in my data vs Tabula. The mismtach goes away if comparing ensembl gene id's, which are more stable than Hugo symbols, and the models don't need to be retrained in that case. This might be more on the user to ensure that their reference and query .var names match, but if PopV tool goes hand-in-hand with Tabula Sapiens as reference, suggestion for usability is to check for overlap in ensgid's from the .var matrix (which would generally be available even if .var_names are Hugo symbols).

canergen commented 1 year ago

Thanks for the suggestion. There will be new pretrained models when reannotation of TS is done and I will take care of using ENSGID (I wasn't aware of this issue).