AuReMe / emapper2gbk

Convert GFF, fastas, annotation table and species name into Genbank.
GNU Lesser General Public License v3.0
14 stars 5 forks source link

Can I add Full taxonomy locally? #19

Open NailouZhang opened 8 months ago

NailouZhang commented 8 months ago

Hi, I know that emapper2gbk retrieves taxonomic information about the organism from https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/ online. However, the scientific names would change over time (such as Blattodean nairo-related virus OKIAV321 now named red goblin roach virus 1). I can get taxonomic information with the tool taxonkit lineage. But I can't inset taxonomic information (Viruese;phylum;order;family;genus;species) into GenBank files. So, could be nice add some options that I can do this like "emapper2gbk -localtaxlineage 'Viruese;phylum;order;family;genus;species' "

Thanks in advance. Yours Nailou

ArnaudBelcour commented 8 months ago

Hi @NailouZhang,

It is technically possible but not well described in the doc. There is an option called --ete that changes how the taxonomic information is handled with -n. Instead of using a scientific name that will be sent to the EBI, it takes a taxonomic information such as "Viruses;phylum;order;family;genus;species" and parsed it with the ete3 package (associated with the NCBI Taxonomy database).

For example, you can give the following options: emapper2gbk ... -n "Viruses;phylum;order;family;genus;species" --ete

emapper2gbk will parse the different taxa and extract the taxonomic information of the lowest rank that matches with the NCBI Taxonomy database.

Best regards, Arnaud.