For the Tabula Sapien tutorial, it looks like gene overlap between reference and query is computed with Hugo gene symbols.
I tried the tutorial on my own query data, and there was mismatch in a handful of genes that had different Hugo symbols in my data vs Tabula. The mismtach goes away if comparing ensembl gene id's, which are more stable than Hugo symbols, and the models don't need to be retrained in that case. This might be more on the user to ensure that their reference and query .var names match, but if PopV tool goes hand-in-hand with Tabula Sapiens as reference, suggestion for usability is to check for overlap in ensgid's from the .var matrix (which would generally be available even if .var_names are Hugo symbols).
Thanks for the suggestion. There will be new pretrained models when reannotation of TS is done and I will take care of using ENSGID (I wasn't aware of this issue).
For the Tabula Sapien tutorial, it looks like gene overlap between reference and query is computed with Hugo gene symbols. I tried the tutorial on my own query data, and there was mismatch in a handful of genes that had different Hugo symbols in my data vs Tabula. The mismtach goes away if comparing ensembl gene id's, which are more stable than Hugo symbols, and the models don't need to be retrained in that case. This might be more on the user to ensure that their reference and query .var names match, but if PopV tool goes hand-in-hand with Tabula Sapiens as reference, suggestion for usability is to check for overlap in ensgid's from the .var matrix (which would generally be available even if .var_names are Hugo symbols).