NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 14 forks source link

Missing Assigned Taxon IDs #154

Open ardasevkar opened 8 months ago

ardasevkar commented 8 months ago

I was reviewing the MaltExtract outputs and noticed that results for specific organisms (listed below) are missing:

I've checked the log file of MaltExtract, and I found an error message stating: <Species_name> has no assigned taxID and cannot be processed! Consider checking for a error in filepath if you provided a taxon file as input

I'm using the NCBI.tre files that come with HOPS, which are located under the Resources folder. If you need it, you can find my code to run MaltExtract below:

java -Xmx500G -jar /usr/local/sw/anaconda3/envs/aMeta/share/hops-0.35-1/MaltExtract1.7.jar -i results/MALT/${lib_id}.rma6 -f def_anc -o results/MALT_EXTRACT --reads --destackingOff --downSampOff --dupRemOff --threads 20 --resources ~/programs/HOPS/Resources --matches --minPI 85.0 --maxReadLength 0 --minComp 0.0 --meganSummary -t results/MALT_EXTRACT/taxa

LeandroRitter commented 4 months ago

Thanks you @ardasevkar! We used to use the ncbi.tre from HOPS Resources folder, however, switched to more regularly updated ncbi.tre from Megan here github.com/husonlab/megan-ce/raw/master/src/megan/resources/files/ncbi.tre. This version of ncbi.tre is automatically downloaded by MaltExtract if one does not explicitly provide any ncbi.tre. The two versions of ncbi.tre are not identical and there can be mismatches between scientific names and taxIDs.

Now, the question is what version of ncbi.tre should we use? My experience is like yours @ardasevkar that the ncbi.tre provided by HOPS is kind of more compatible with Malt and results in much fewer mismatches between scientific name and taxID. So I would consider downloading the HOPS version of ncbi.tre and providing it to MaltExtract instead of letting MaltExtract to download a MEGAN version of ncbi.tre. Any thoughts @ZoePochon, @clami66 and @percyfal?

LeandroRitter commented 4 months ago

@ardasevkar are you suggesting that those microbes could have been found if we had used ncbi.tre from the HOPS Resources folder? Do you find them when you run plain HOPS with the native ncbi.tre from the HOPS Resources folder?

Right now the issue seems to be due to the discrepancy between the ncbi.tre from the HOPS Resources folder and MEGAN's ncbi.tre