Open karlkashofer opened 1 year ago
It seems to be restricted to toplevel domains:
karl@sx253:/scratch/tmp/Pseudomonas/kraken2-pipeline$ cat D7055-2.k2report | grep -P '\t\t'
78.74 78744 1580 523338 339180 1 root
77.16 77160 177 509352 330978 2 Bacteria
0.00 4 0 49 48 2759 Eukaryota
karl@sx253:/scratch/tmp/Pseudomonas/kraken2-pipeline$
Manually adding the correct R and D flags fixes the problem, bracken is happy.
I'd still like to know why this happens.
I've also run into the issue of report files missing rank codes for root and domains. Just recently when using the nt database and also previously when using a custom database after combining PlusPFP and my own sequence via --add-to-library.
For the nt database, its inspect.txt file shows all rank codes are present. Only the reports show them as missing.
Command I used for the runs:
kraken2 \
--confidence 0.1 \
--minimum-hit-groups 3 \
--gzip-compressed \
--db ${db} \
--threads ${SLURM_CPUS_ON_NODE} \
--output ${SLURM_JOB_NAME}.out \
--report ${SLURM_JOB_NAME}.report \
--classified-out ${SLURM_JOB_NAME}_class.fastq \
--unclassified-out ${SLURM_JOB_NAME}_unclass.fastq \
${reads}
I just ran into the same issue of missing rank codes with Kraken version 2.1.2, and both k2_pluspf_20220908 and k2_pluspf_20230605 db.
In my kraken reports, "D" is missing, but subdomain ranks ("D1") show up. "R" is also missing, and the subrank partially shows up ("1" instead of "R1"). This caused issues for me when using KrakenTools/extract_kraken_reads.py with the --include-children
flag. I resolved this by manually adding the missing rank codes.
My command:
kraken2 \
--db k2_pluspf_20220908 \
--gzip-compressed \
--threads 32 \
--minimum-hit-groups 3 \
--report-minimizer-data \
--output k2db_20220908/hits.txt \
--report k2db_20220908/report.txt \
--paired \
S18_R1.fastq.gz \
S18_R2.fastq.gz
Ugg, if you have hundreds of files to fix the following script might be useful. It replaces the missing classification of root, Bacteria, and Eukaryota of all kreports in the CWD with -,D,D respectively. Modify for your given database accordingly.
for val in *.kreport
do
fix=$(echo "$val"|sed -r 's/^(.+).kreport/\1_corrected.kreport/g') sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)(root)$/\1\2\3-\t\5\6/g' "$val" | \ sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\s+)(Bacteria)$/\1\2\3D\t\5\6/g' | \ sed -r 's/^([^\t]\t)([^\t]\t)([^\t]\t)([^\t]\t)([^\t]*\s+)(Eukaryota)$/\1\2\3D\t\5\6/g' > "$fix"
done
@karlkashofer I have noticed this issue recently and am currently investigating.
I'll try to update soon. We are aware of the issue and are trying to address it.
Hi, I just ran into this issue with both v2.1.3 and with the current master branch. I tried both the current standard DB and the 2024-06-05 one.
kraken2 --threads 4 --db standard --output - --report $fout --paired $fin1 $fin2
Using the standard 16gb library and some bacterial reads we seem to get empty taxonomic level fields in the kraken2 report file. For example, the root and bacteria lines have no taxonomic level.
bracken does not like that... Are we doing something wrong ?