atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
66 stars 19 forks source link

0 `mo` gene symbols match between the datasets and the BLAST graph. #97

Closed houruiyan closed 1 year ago

houruiyan commented 1 year ago

Hello, thank you very much for your kind help.

I meet another question again. Now I can smoothly run the SAMP(). However, the result cannot meet my exception. I always get the 0 gene symbols match between the datasets and the BLAST graph.

image

The following is my code

image image image

Hope to get yoru answer. Thank you very much!

atarashansky commented 1 year ago

Hi @houruiyan ! Notice how in the transcriptomes, you have the isoforms! (ENSG.....1234.1), whereas in your data you do not have the isoform. So the strings don't match up.

What I would recommend is to read the mapping tables (e.g. mo_to_ra.txt) and delete the .1/.2/.3s at the end of all the gene names, then save the tables. That should fix your problem.

Best, Alec

houruiyan commented 1 year ago

Thank you very much. Alec! Maybe it is not correct to just delete .1 .2 .3 ? The different isoform has different value in that mapping table. So which isoform's value should I select for the certain gene? Hope to hear you. Thank you!

atarashansky commented 1 year ago

It's okay to delete .1 .2 .3 as SAMap will just collapse all isoforms into one node for that transcript and combine all the different mapping values.

So if A_GENE1.1 maps to B_GENE1 and A_GENE1.2 maps to B_GENE2, then A_GENE1 will map to B_GENE1 and B_GENE2.

Combining the isoforms is correct because that's what the read mapping is effectively doing if the transcriptome/gtf that you used for generating the expression matrix does not delineate between isoforms.

atarashansky commented 1 year ago

Closing for now! Please reopen if you're still having trouble.