I've been conducting metagenomics analysis using Kraken2 for shotgun sequencing data. I originally used a minikraken database downloaded in 2020 for my analysis workflow towards the end of 2022. Here is the workflow I used:
for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do
hisat2 -p 10 --rna-strandness RF -x /HDD1/Classifiers/GRCh38 -1 /HDD4/TUS2/${f}_1.fq.gz -2 /HDD4/TUS2/${f}_2.fq.gz 2> /HDD4/TUS2/03_hisat/${f}.log | samtools view -@ 10 -Sbo /HDD4/TUS2/03_hisat/${f}.bam
done
for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do
samtools sort -O bam -o /HDD4/TUS2/04_samtools/${f}.sorted.bam /HDD4/TUS2/03_hisat/${f}.bam -@ 10
done
for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do
samtools view -b -f 4 /HDD4/TUS2/04_samtools/${f}.sorted.bam > /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10
done
for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do
samtools fq -1 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz -2 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10
done
for f in ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do
kraken2 --use-names --threads 10 --db /HDD1/minikraken/minikraken --fq-input --report /HDD4/TUS2/05_kraken/minikraken/${f}.kraken.report.csv --gzip-compressed --paired /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz > /HDD4/TUS2/05_kraken/minikraken/${f}.sorted.kraken --use-mpa-style --report-zero-counts
done
However, when I re-executed the same code with the same files in March 2024, I noticed significant differences in the read counts in the output files.
Could there be any issue within this code that might have caused such discrepancies? Attached are example output files for reference.
counts.csv
Hello,
I've been conducting metagenomics analysis using Kraken2 for shotgun sequencing data. I originally used a minikraken database downloaded in 2020 for my analysis workflow towards the end of 2022. Here is the workflow I used:
for f in
ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do hisat2 -p 10 --rna-strandness RF -x /HDD1/Classifiers/GRCh38 -1 /HDD4/TUS2/${f}_1.fq.gz -2 /HDD4/TUS2/${f}_2.fq.gz 2> /HDD4/TUS2/03_hisat/${f}.log | samtools view -@ 10 -Sbo /HDD4/TUS2/03_hisat/${f}.bam donefor f in
ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do samtools sort -O bam -o /HDD4/TUS2/04_samtools/${f}.sorted.bam /HDD4/TUS2/03_hisat/${f}.bam -@ 10 donefor f in
ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do samtools view -b -f 4 /HDD4/TUS2/04_samtools/${f}.sorted.bam > /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10 donefor f in
ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do samtools fq -1 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz -2 /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped.bam -@ 10 donefor f in
ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'
do kraken2 --use-names --threads 10 --db /HDD1/minikraken/minikraken --fq-input --report /HDD4/TUS2/05_kraken/minikraken/${f}.kraken.report.csv --gzip-compressed --paired /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_1.fq.gz /HDD4/TUS2/04_samtools/${f}.sorted.unmapped_2.fq.gz > /HDD4/TUS2/05_kraken/minikraken/${f}.sorted.kraken --use-mpa-style --report-zero-counts doneHowever, when I re-executed the same code with the same files in March 2024, I noticed significant differences in the read counts in the output files.
Could there be any issue within this code that might have caused such discrepancies? Attached are example output files for reference. counts.csv