Open lhqing opened 2 years ago
Hello Hanqing @lhqing,
Thank you very much for using HISAT-3N as your sequence aligner. Also, this is a great issue description and it is easy to repeat.
For 2: I map the reads you provided to GRCm39 and I got the same result as you. The reads can be mapped to the reference on normal mode, but only 1.07% overall alignment rate when using --directional-mapping
. You are right because HISAT-3N assumes the original top strand (OT) and original bottom strand (OB) alignment. To solve this problem, we are developing a new version of HISAT-3N with could make the reverse directional mapping. Please check the hisat-3n-dev-directional-mapping-reverse
branch. It has --directional-mapping
option that works as before, and --directional-mapping-reverse
that supports the PBAT library.
Here is my testing result:
$ ../../hisat-3n-dev-directional-mapping-reverse/hisat-3n \
-x ../../data/index/hisat-3n/GRCm39 \
-q -1 sample_cell.trimmed.R1.fq.gz -2 sample_cell.trimmed.R2.fq.gz \
--unique-only \
--base-change C,T \
--no-spliced-alignment \
--no-temp-splicesite \
-t --new-summary --summary-file hisat_3n_summary.directional.txt \
--threads 1 \
--directional-mapping-reverse \
-S leo_directional_reverse.sam
Multiseed full-index search: 00:00:24
HISAT2 summary stats:
Total pairs: 50016
Aligned concordantly or discordantly 0 time: 17831 (35.65%)
Aligned concordantly 1 time: 29608 (59.20%)
Aligned concordantly >1 times: 1665 (3.33%)
Aligned discordantly 1 time: 912 (1.82%)
Total unpaired reads: 35662
Aligned 0 time: 24465 (68.60%)
Aligned 1 time: 9045 (25.36%)
Aligned >1 times: 2152 (6.03%)
Overall alignment rate: 75.54%
Time searching: 00:00:25
Overall time: 00:00:30
This version of HISAT-3N can map your testing reads without any problem. We will merge it to the hisat-3n
branch soon.
For 3: Yes, you are right. HISAT-3N/HISAT2 only output 3 types of MAPQ: 60 is for unique mapped. 1 is for multiple mapped. 0 is for unmapped.
When hisat-3n-table
faces multiple mapped reads, the output table will count all positions it mapped.
Please let me know if you have any other questions or if you find any bugs in the developing version of HISAT-3N. Thanks again for using HISAT-3N.
Leo
Hi Leo, @imzhangyun
Thank you for the quick reply! I tried the hisat-3n-dev-directional-mapping-reverse
branch, and it worked great for me, making hisat-3n even faster. I will let you know if I find any issues when running this version on larger datasets later.
Hanqing
Hi hisat-3n developers,
Thank you for building this wonderful aligner!
I am working on using hisat-3n to align all the single-cell DNA methylome and multiome data generated by the snmC-seq2 (DNA mC), snmCAT-seq (DNA mC + RNA + NOMe), and snm3C-seq (DNA mC + 3C) technologies in Ecker lab at the Salk Institute.
I did a benchmark between the hisat-3n-based pipeline (snakefile) with our previous bismark-based pipeline (snakefile). I notice two great benefits that made us willing to switch to hisat-3n to align our data
However, I do have questions related to the
--directional-mapping
and--repeat
mode of hisat-3n1. Version and Example Files
Here are the commands for reproducibility:
The hisat-3n version I am using:
The hisat-3n-build command I used
If needed, you can download the FASTQ files and command outputs I mentioned below in this google drive link: https://drive.google.com/drive/folders/1RAJLsl_LQKfisJ5c1VBNWyaOl6t_NBX3?usp=sharing The FASTQ is from a mouse snmC-seq2 cell, I map it to mm10 genome.
2. About the
--directional-mapping
When I try hisat-3n mapping in normal mode, the mapping rate is good
However, when adding the
--directional-mapping
parameter, the mapping rate is close to 0.My best guess is that the
--directional-mapping
only considers the original top strand (OT) and original bottom strand (OB) alignment, equivalent to Bismark directional mapping mode. However, our data is similar to PBAT library (R1), where R1 are supposed to be mapped to the strands complementary to OT (CTOT) and OB (CTOB), and R2 are supposed to be mapped to OT and OB. You can confirm this by running these two bismark commands, and bismark output will tell you which strand R1 and R2 map to:Bismark provides a
--pbat
parameter that allows mapping the PBAT library in directional mode, which is different from the default directional mode. In HISAT-3N, is there a way to have directional mapping flexible for different kinds of libraries?3. About the multi-aligned reads
I also tried the
--repeat
mode of hisat-3n. I noticed the reads in the resulting BAM file only have three kinds of MAPQ scores (0, 1, 60). I wonder am I understand these three scores correctly? 0: reads not aligned, 1: reads aligned >1 time 60: reads unique aligned 1 time, including properly paired and discordantly pairedI noticed for the reads with MAPQ==1 will occur multiple times in the BAM file. When using the
hisat-3n-table
to generate counts for these reads, are you counting them multiple times at different genome locations?Thank you for reading this rather long issue, I appreciate your time, and thanks again for this great tool!
Hanqing