DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

Changing the number of threads gives subtly different results #264

Open akaviaLab opened 3 years ago

akaviaLab commented 3 years ago

I have an example file which I've been using to test my scripts for running HISAT2 Input file 1 - https://github.com/bioinform/rnacocktail/raw/master/test/A1_1.fq.gz Input file 2 - https://github.com/bioinform/rnacocktail/raw/master/test/A1_2.fq.gz known splice sites - Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt which is attached Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt

The index was generated with the following commands wget ftp://ftp.ensembl.org/pub/release-90//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz gunzip Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz hisat2-build Homo_sapiens.GRCh38.dna.chromosome.21.fa Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2

I can attach it if you'd like.

Running the commands give slightly different summary, slighlyt different splice sites and some differences in SAM files. Output from PicardTools CompareSAMs attached as example (ignoring headers) Match 232060 Differ 923 Unmapped_both 8555 Unmapped_left 13 Unmapped_right 1 Missing_left 0 Missing_right 0 SAM files differ.

Commands to run HISAT and summary output below.

Hisat version is /usr/bin/hisat2-align-s version 2.1.0 64-bit Built on Debian 11 February 2020 Compiler: gcc version 9.2.1 20200203 (Ubuntu 9.2.1-28ubuntu1) Options: -O3 -funroll-loops -g3 -Wdate-time -D_FORTIFY_SOURCE=2 Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

ubuntu@ip-172-31-4-165:~/rnacocktail-py3$ hisat2 --dta --rg-id A1 --rg SM:A1 --threads 16 --known-splicesite-infile ${HISAT_INPUT}/Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt -x ${HISAT_INPUT}/Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2 -1 ${HISAT_INPUT}/A1_1.fq.gz -2 ${HISAT_INPUT}/A1_2.fq.gz -S ./16threads_A1.sam --novel-splicesite-outfile 16threads_A1_splicesites.tab --new-summary --summary-file 16threads_A1_hisat2_summary.txt HISAT2 summary stats: Total pairs: 120776 Aligned concordantly or discordantly 0 time: 8538 (7.07%) Aligned concordantly 1 time: 80285 (66.47%) Aligned concordantly >1 times: 30957 (25.63%) Aligned discordantly 1 time: 996 (0.82%) Total unpaired reads: 17076 Aligned 0 time: 8568 (50.18%) Aligned 1 time: 6631 (38.83%) Aligned >1 times: 1877 (10.99%) Overall alignment rate: 96.45%

ubuntu@ip-172-31-4-165:~/rnacocktail-py3$ hisat2 --dta --rg-id A1 --rg SM:A1 --threads 1 --known-splicesite-infile ${HISAT_INPUT}/Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt -x ${HISAT_INPUT}/Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2 -1 ${HISAT_INPUT}/A1_1.fq.gz -2 ${HISAT_INPUT}/A1_2.fq.gz -S ./1threads_A1.sam --novel-splicesite-outfile 1threads_A1_splicesites.tab --new-summary --summary-file 1threads_A1_hisat2_summary.txt HISAT2 summary stats: Total pairs: 120776 Aligned concordantly or discordantly 0 time: 8530 (7.06%) Aligned concordantly 1 time: 80821 (66.92%) Aligned concordantly >1 times: 30430 (25.20%) Aligned discordantly 1 time: 995 (0.82%) Total unpaired reads: 17060 Aligned 0 time: 8556 (50.15%) Aligned 1 time: 6682 (39.17%) Aligned >1 times: 1822 (10.68%) Overall alignment rate: 96.46%