Starlitnightly / omicverse

A python library for multi omics included bulk, single cell and spatial RNA-seq analysis.
https://starlitnightly.github.io/omicverse/
GNU General Public License v3.0
431 stars 45 forks source link

More complete gene ensembl id -> hgnc symbol pairs table #66

Open ElderMedic opened 6 months ago

ElderMedic commented 6 months ago

Is your feature request related to a problem? Please describe. data=ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_GRCh38.tsv') left with over 20k+ unconverted gene ensembl ids (h.sapiens, Grch38, 30%+ of all genes in counts). I was trying to build a more complete table.

Describe alternatives you've considered I just selected approved symbols and ensembl ids in the hgnc website: https://www.genenames.org/download/custom/ Removed all nan and made it a tsv table. Using that table I have all gene ids mapped.

Additional context See attached for the gene id mapping table.

pair_hgnc_all.tsv.tar.gz

ElderMedic commented 6 months ago

I just discovered if I do ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_hgnc_all.tsv') unmapped genes will be cut from the dataframe so maybe need to disallow function to remove genes that's not on the gene id pair table.