atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
66 stars 19 forks source link

Mapping eggnogs - code snippet #65

Closed ivanferrreira closed 2 years ago

ivanferrreira commented 2 years ago

Hi,

Thank you for developing such awesome tool, samap has been working by far the best compared to several other tools for integration across species

I have struggled with mapping paralogues, orthologues and substitutions in my data using the current eggnog tutorial though. It seems like the input dataframe I am providing is formatted as it should to be used in function convert_eggnog_to_homologs

Could you please provide a little code snippet showing how to get the eggnog input file for usage in samap?

For instance, I have tried running this to get my eggnogs, from Ciona in this example: emapper.py -m diamond --itype proteins --dmnd_db /users/ivan/devel/ivybridge/python3-venv-ivybridge-3.8.2/lib/python3.8/site-packages/eggnog_mapper-2.1.6-py3.8.egg/data/chordata.dmnd -i Ciona.fa --target_orthologs all -o Ciona

Thank you, Ivan

ivanferrreira commented 2 years ago

Figured it out, it was an issue with the format of the eggNOG_OGs output (OG@tax_id|tax_name); using this output directly into samap causes a bug.

Removing the taxa names and the "|" resolves the issue i.e.: Ciona['eggNOG_OGs'] = Ciona['eggNOG_OGs'].replace(list,"", regex=True) #list is a comma separated list of strings to be replaced in the column eggNOG_OGs Ciona['eggNOG_OGs'] = Ciona['eggNOG_OGs'].str.replace(r'|', '', regex=True)