DennisSchmitz / Jovian_archive

Metagenomics/viromics pipeline that focuses on automation, user-friendliness and a clear audit trail. Jovian aims to empower classical biologists and wet-lab personnel to do metagenomics/viromics analyses themselves, without bioinformatics expertise.
GNU Affero General Public License v3.0
18 stars 7 forks source link

LCA by Krona is missing certain viral taxa, use Mgkit instead? #46

Closed thierryjanssens closed 4 years ago

thierryjanssens commented 5 years ago

When analysing my dataset with the PZN_ERVINGS fork of PZN (the name of the current pipeline before it was renamed to Jovian), that contains a.o. an alternative LCA method (mgkit; Rubino, F. and Creevey, C.J. 2014. MGkit: Metagenomic Framework For The Study Of Microbial Communities. . Available at: figshare [doi:10.6084/m9.figshare.1269288].`), I observed that Jovian is missing certain viral contigs. The minimum contig length was reduced to 250 bp, in order to increase the detection in the dilute respiratory samples...

e.g. a Coronavirus OC43 and Adenovirus.

Despite the good blast hits (with decent e-values and bitscores) Krona place the deduced taxonomic position in the root of life. e.g for the adenovirus case:

(Thierry) [janssetk@rivm-biohn-l01p taxonomic_classification]$ grep -i "adenovirus" *.blastn
4311801173_S5_L001.blastn:NODE_1056_length_447_cov_1.896875     V00040.1        98.655  446     3       2       3       447     596     153     0.0     787     NODE_1056_length_447_cov_1.896875       gi|58548|emb|V00040.1|  10519   Human adenovirus 7   Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801173_S5_L001.blastn:NODE_1056_length_447_cov_1.896875     V00040.1        99.286  140     1       0       1       140     279     140     2.68e-63        254     NODE_1056_length_447_cov_1.896875       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801173_S5_L001.blastn:NODE_1056_length_447_cov_1.896875     V00040.1        99.200  125     1       0       323     447     596     472     5.84e-55        226     NODE_1056_length_447_cov_1.896875       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801214_S18_L001.blastn:NODE_24_length_1112_cov_17.405076    V00040.1        98.873  355     1       2       756     1109    140     492     2.76e-176       630     NODE_24_length_1112_cov_17.405076       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801214_S18_L001.blastn:NODE_24_length_1112_cov_17.405076    V00040.1        98.253  229     1       2       666     893     370     596     3.05e-106       398     NODE_24_length_1112_cov_17.405076       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801214_S18_L001.blastn:NODE_24_length_1112_cov_17.405076    V00040.1        96.970  132     3       1       1       131     407     538     7.09e-53        220     NODE_24_length_1112_cov_17.405076       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801214_S18_L001.blastn:NODE_24_length_1112_cov_17.405076    V00040.1        93.750  80      4       1       53      131     140     219     2.66e-22        119     NODE_24_length_1112_cov_17.405076       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
4311801214_S18_L001.blastn:NODE_24_length_1112_cov_17.405076    V00040.1        100.000 37      0       0       1076    1112    140     176     2.72e-07        69.4    NODE_24_length_1112_cov_17.405076       gi|58548|emb|V00040.1|  10519Human adenovirus 7      Adenovirus type 7 with human Hinf repeated DNA element inserted
(Thierry) [janssetk@rivm-biohn-l01p taxonomic_classification]$ grep -i "NODE_24_length_1112_cov_17.405076" *.taxtab
4311801214_S18_L001.taxtab:NODE_24_length_1112_cov_17.405076    1       -176.114338572293

I will work on an implementation of the mgkit tools in Jovian ASAP.

DennisSchmitz commented 5 years ago

Update: Currently the mgkit method is being validated. We expect to incorporate it in the next version.

DennisSchmitz commented 5 years ago

@thierryjanssens , do you have an update on this issue? Does the mgkit LCA implementation in the v0.9.5-dev branch fix this?

florianzwagemaker commented 4 years ago

We're currently using Mgkit as a default instead of krona. considering this as fixed