DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 270 forks source link

Error in kraken2 with taxpasta #869

Closed brockels closed 2 months ago

brockels commented 2 months ago

i ran this: $ taxpasta merge -p kraken2 -o kraken_silva138_merged.tsv /mnt/gpfs/scratch/projects/2024-hvn_metagenomics//_trim_P_sealed_SILVA_kraken2.txt got this back: CRITICAL Error in sample 'CS-100_trim_P_sealed_SILVA_kraken2' with profile '/mnt/gpfs/scratch/projects/2024-hvn_metagenomics/PEAR/CS-100_trim_P_sealed_SILVA_kraken2.txt'. merge.py:419 CRITICAL Unexpected kraken2 report format. It has 5 columns but only six or eight are expected. merge.py:424 Unexpected kraken2 report format. It has 5 columns but only six or eight are expected.

see below one line: U LH00292:163:223CNFLT4:1:1101:47304:1868 0 151|149 0:117 |:| 0:115

Why did I get this? taxpasta --version 0.7.0 Kraken version 2.0.7-beta Copyright 2013-2018, Derrick Wood (dwood@cs.jhu.edu)

ChillarAnand commented 2 months ago

Did you kraken2 ran without any issues? Can you share a sample data of kraken report?

Generally it should have 6 columns like this.

❯ head kraken.report
  0.10  1       1       U       0       unclassified
 99.90  999     0       R       1       root
 99.90  999     0       R1      131567    cellular organisms
 99.90  999     3       D       2           Bacteria
 69.00  690     0       D1      1783272       Terrabacteria group
 69.00  690     1       P       1239            Firmicutes
 68.80  688     1       C       91061             Bacilli
 57.80  578     13      O       1385                Bacillales
 24.50  245     3       F       186817                Bacillaceae
 23.30  233     5       G       1386                    Bacillus
brockels commented 2 months ago

Thank you I found my error. in my kraken2 command I should have added in --use-names as now I have the 8 columns I needed for taxpasta to work: This works now as the output has the 8 columns with the names kraken2 --paired --db /mnt/gpfs/scratch/projects/2024-hvn_metagenomics/kraken2/SILVA --gzip-compressed --threads 4 CS-26_trim1P_bbduk.gz CS-26_trim2P_bbduk.gz --use-names --report CS-26_trim_P_sealed_SILVA_kraken2_report.txt --report-minimizer-data --report-zero-counts > /dev/null

brockels commented 2 months ago

--use-names then your 5 column output is 8 columns

99.56 11112343 11112343 0 0 U 0 unclassified 0.44 48769 240 1957973 25618 R 1 root 0.42 46910 6204 1879316 25170 D 3 Bacteria 0.15 17206 424 221661 5425 P 1672 Firmicutes 0.13 14843 282 187483 2962 C 1673 Bacilli 0.12 13026 400 167625 1710 O 1800 Lactobacillales 0.09 10053 5 139753 1059 F 1850 Streptococcaceae 0.09 10046 10046 139704 1032 G 1853 Streptococcus