The index was generated with the following commands
wget ftp://ftp.ensembl.org/pub/release-90//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz
gunzip Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz
hisat2-build Homo_sapiens.GRCh38.dna.chromosome.21.fa Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2
I can attach it if you'd like.
Running the commands give slightly different summary, slighlyt different splice sites and some differences in SAM files. Output from PicardTools CompareSAMs attached as example (ignoring headers)
Match 232060
Differ 923
Unmapped_both 8555
Unmapped_left 13
Unmapped_right 1
Missing_left 0
Missing_right 0
SAM files differ.
Commands to run HISAT and summary output below.
Hisat version is
/usr/bin/hisat2-align-s version 2.1.0
64-bit
Built on Debian
11 February 2020
Compiler: gcc version 9.2.1 20200203 (Ubuntu 9.2.1-28ubuntu1)
Options: -O3 -funroll-loops -g3 -Wdate-time -D_FORTIFY_SOURCE=2
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
I have an example file which I've been using to test my scripts for running HISAT2 Input file 1 - https://github.com/bioinform/rnacocktail/raw/master/test/A1_1.fq.gz Input file 2 - https://github.com/bioinform/rnacocktail/raw/master/test/A1_2.fq.gz known splice sites - Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt which is attached Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt
The index was generated with the following commands wget ftp://ftp.ensembl.org/pub/release-90//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz gunzip Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz hisat2-build Homo_sapiens.GRCh38.dna.chromosome.21.fa Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2
I can attach it if you'd like.
Running the commands give slightly different summary, slighlyt different splice sites and some differences in SAM files. Output from PicardTools CompareSAMs attached as example (ignoring headers) Match 232060 Differ 923 Unmapped_both 8555 Unmapped_left 13 Unmapped_right 1 Missing_left 0 Missing_right 0 SAM files differ.
Commands to run HISAT and summary output below.
Hisat version is /usr/bin/hisat2-align-s version 2.1.0 64-bit Built on Debian 11 February 2020 Compiler: gcc version 9.2.1 20200203 (Ubuntu 9.2.1-28ubuntu1) Options: -O3 -funroll-loops -g3 -Wdate-time -D_FORTIFY_SOURCE=2 Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
ubuntu@ip-172-31-4-165:~/rnacocktail-py3$ hisat2 --dta --rg-id A1 --rg SM:A1 --threads 16 --known-splicesite-infile ${HISAT_INPUT}/Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt -x ${HISAT_INPUT}/Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2 -1 ${HISAT_INPUT}/A1_1.fq.gz -2 ${HISAT_INPUT}/A1_2.fq.gz -S ./16threads_A1.sam --novel-splicesite-outfile 16threads_A1_splicesites.tab --new-summary --summary-file 16threads_A1_hisat2_summary.txt HISAT2 summary stats: Total pairs: 120776 Aligned concordantly or discordantly 0 time: 8538 (7.07%) Aligned concordantly 1 time: 80285 (66.47%) Aligned concordantly >1 times: 30957 (25.63%) Aligned discordantly 1 time: 996 (0.82%) Total unpaired reads: 17076 Aligned 0 time: 8568 (50.18%) Aligned 1 time: 6631 (38.83%) Aligned >1 times: 1877 (10.99%) Overall alignment rate: 96.45%
ubuntu@ip-172-31-4-165:~/rnacocktail-py3$ hisat2 --dta --rg-id A1 --rg SM:A1 --threads 1 --known-splicesite-infile ${HISAT_INPUT}/Homo_sapiens.GRCh38.90.chromosome.21.gtf.known-splicesite.txt -x ${HISAT_INPUT}/Homo_sapiens.GRCh38.dna.chromosome.21.HISAT2 -1 ${HISAT_INPUT}/A1_1.fq.gz -2 ${HISAT_INPUT}/A1_2.fq.gz -S ./1threads_A1.sam --novel-splicesite-outfile 1threads_A1_splicesites.tab --new-summary --summary-file 1threads_A1_hisat2_summary.txt HISAT2 summary stats: Total pairs: 120776 Aligned concordantly or discordantly 0 time: 8530 (7.06%) Aligned concordantly 1 time: 80821 (66.92%) Aligned concordantly >1 times: 30430 (25.20%) Aligned discordantly 1 time: 995 (0.82%) Total unpaired reads: 17060 Aligned 0 time: 8556 (50.15%) Aligned 1 time: 6682 (39.17%) Aligned >1 times: 1822 (10.68%) Overall alignment rate: 96.46%