DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

Unexpected output in hisat-3N runs with --un-conc #298

Closed jhfoxliu closed 3 years ago

jhfoxliu commented 3 years ago

Hi,

I am running hisat-3N but I meet a trouble. Everything goes well until I added the option --un-conc to try to extract unmapped reads from the program. When the option was set, hisat-3N became very strange that only read1 can be found in the SAM output, and read2 were stored in unmapped.1 file in a single line, leaving unmapped.2 empty.

My cmd is: hisat-3n -x /path/to/index/Mus_musculus.GRCm38.dna_sm.primary_assembly.format -S hisat2_3N.unique.v2.sam --no-mixed --rna-strandness FR -p 16 --fr -1 fwd.fastq -2 rev.fastq --base-change C,T --unique-only --un-conc unmapped

Best, Jianheng

imzhangyun commented 3 years ago

Hello Jianheng,

Thank you for using HISAT-3N. The --unique-only option for HISAT-3N is designed to ignore any alignment results other than unique aligned. This could influence the function of --un-conc, --un, --al. To use these options for saving extra information, please do not use the --unique-only option. You can filter the alignment result by samtools. Here is a example :

  1. align your reads without --unique-only option hisat-3n -x /path/to/index/Mus_musculus.GRCm38.dna_sm.primary_assembly.format -S hisat2_3N.v2.sam --no-mixed --rna-strandness FR -p 16 --fr -1 fwd.fastq -2 rev.fastq --base-change C,T --un-conc unmapped

  2. filter the alignment result by Samtools that skipping alignments with MAPQ smaller than 10. HISAT-3N use MAPQ = 1 for multiple aligned reads, and 60 for unique aligned reads. samtools view -h -q 10 hisat2_3N.v2.sam > hisat2_3N.v2.unique.sam

I hope this is helpful and please let me know if you have any other question.

Leo

jhfoxliu commented 3 years ago

Hello Jianheng,

Thank you for using HISAT-3N. The --unique-only option for HISAT-3N is designed to ignore any alignment results other than unique aligned. This could influence the function of --un-conc, --un, --al. To use these options for saving extra information, please do not use the --unique-only option. You can filter the alignment result by samtools. Here is a example :

  1. align your reads without --unique-only option hisat-3n -x /path/to/index/Mus_musculus.GRCm38.dna_sm.primary_assembly.format -S hisat2_3N.v2.sam --no-mixed --rna-strandness FR -p 16 --fr -1 fwd.fastq -2 rev.fastq --base-change C,T --un-conc unmapped
  2. filter the alignment result by Samtools that skipping alignments with MAPQ smaller than 10. HISAT-3N use MAPQ = 1 for multiple aligned reads, and 60 for unique aligned reads. samtools view -h -q 10 hisat2_3N.v2.sam > hisat2_3N.v2.unique.sam

I hope this is helpful and please let me know if you have any other question.

Leo

Thanks Leo. I solved it with a script scanning for unmapped reads.