NBISweden / Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
https://www.earthbiogenome.org/
GNU General Public License v3.0
12 stars 8 forks source link

Automated codon usage table selection #88

Open mahesh-panchal opened 9 months ago

mahesh-panchal commented 9 months ago

Mitohifi and perhaps other tools need to specify the codon usage table to use. Can this be automated?

gbdias commented 3 weeks ago

The simplest solution I could find is to use Entrez Direct efetch to search NCBI taxonomy using the taxonomy ID (that we currently get from the TAXONKIT_NAME2LINEAGE process:

efetch -db taxonomy -id ${taxid} -format xml | awk -F'[<>]' '/<MGCId>/ {print $3}' > mitocode.txt

nf-core has some of the Entrez Direct tools but not efetch. Entrez Direct is in bioconda so a container can be pulled from Seqera.

mahesh-panchal commented 3 weeks ago

Want to have a shot at writing either an nf-core module or a local module for it?

nf-core module

Make a fork of nf-core modules.

git checkout -b entrezdirect_efetch

Then make a new module

nf-core modules create entrezdirect/efetch

but it means you'll also have to write an nf-test too.

Locally

Make a fork of the EBP pilot workflow

git checkout -b entrezdirect_efetch
nf-core modules create entrezdirect/efetch

but here you won't have to write an nf-test and don't need to follow nf-core guidelines.

Then make PR back to the main workflow