DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
683 stars 266 forks source link

Same process, same DB, different result #811

Open minchanceking opened 3 months ago

minchanceking commented 3 months ago

Hello,

I've been conducting metagenomics analysis using Kraken2 for shotgun sequencing data. I originally used a minikraken database downloaded in 2020 for my analysis workflow towards the end of 2022. Here is the workflow I used:


for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//' do hisat2 -p 10 --rna-strandness RF -x /HDD1/Classifiers/GRCh38 -1 /HDD4/TUS2/${f}_1.fq.gz -2 /HDD4/TUS2/${f}_2.fq.gz 2> /HDD4/TUS2/03_hisat/${f}.log | samtools view -@ 10 -Sbo /HDD4/TUS2/03_hisat/${f}.bam done

for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//' do samtools sort -O bam -o /HDD4/TUS2/04_samtools/${f}.sorted.bam /HDD4/TUS2/03_hisat/${f}.bam -@ 10 done

for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//' do samtools view -b -f 4 /HDD4/TUS2/04_samtools/${f}.sorted.bam > /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10 done

for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//' do samtools fq -1 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz -2 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10 done

for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//' do kraken2 --use-names --threads 10 --db /HDD1/minikraken/minikraken --fq-input --report /HDD4/TUS2/05_kraken/minikraken/${f}.kraken.report.csv --gzip-compressed --paired /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz > /HDD4/TUS2/05_kraken/minikraken/${f}.sorted.kraken --use-mpa-style --report-zero-counts done

However, when I re-executed the same code with the same files in March 2024, I noticed significant differences in the read counts in the output files.

Could there be any issue within this code that might have caused such discrepancies? Attached are example output files for reference. counts.csv