DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
478 stars 119 forks source link

Mapping using different parameters --very-sensitive and default #430

Open AllanOkwaro opened 6 months ago

AllanOkwaro commented 6 months ago

Dear developers,

I have mapped some reads using hisat2 and getting different results when I tweak the mapping parameters by changing the default preset --sensitive parameter to --very-sensitive The difference in mapping rate is huge, with sequences having a mapping rate of 50% suddenly having a mapping rate of over 80%. I am not sure whether this is what the software is supposed to do or if the --very-sensitive parameter increases the number of mismatches. my codes are as below. For default settings, I use `#!/bin/bash

Load the HiSAT2 module if needed

module load bio/hisat2/2.2.1

Create the output directory if it doesn't exist

mkdir -p hisatout

Iterate over files ending with "_1.fq.gz" in the current directory

for forward_read_file in *_1.fq.gz; do

Extract file basename without extension

file_basename="${forward_read_file%%_1.fq.gz}"

# Define output files
output_sam="hisatout/${file_basename}.sam"
report_file="hisatout/${file_basename}.report"
reverse_read_file="${file_basename}_2.fq.gz"

# Run HiSAT2 with specified options
hisat2 -p 64 \
       -x mbel.index \
       -1 "$forward_read_file" \
       -2 "$reverse_read_file" \
       -S "$output_sam" \
       --summary-file "$report_file" &

done

Wait for all background processes to finish

wait ` For the --very-sensitive parameter, I used

`#!/bin/bash

Load the HiSAT2 module if needed

module load bio/hisat2/2.2.1

Create the output directory if it doesn't exist

mkdir -p hisatout

Iterate over files ending with "_1.fq.gz" in the current directory

for forward_read_file in *_1.fq.gz; do

Extract file basename without extension

file_basename="${forward_read_file%%_1.fq.gz}"

# Define output files
output_sam="hisatout/${file_basename}.sam"
report_file="hisatout/${file_basename}.report"
reverse_read_file="${file_basename}_2.fq.gz"

# Run HiSAT2 with specified options
hisat2 -p 64 \
       -x mbel.index \
       -1 "$forward_read_file" \
       -2 "$reverse_read_file" \
       -S "$output_sam" \
       --very-sensitive \
       --summary-file "$report_file" &

done

Wait for all background processes to finish

wait The only difference is that one code uses the default hisat2 preset setting while in the second one I used the--very-sensitive` parameter.

The output files are quite different

very sensitive `29691907 reads; of these: 29691907 (100.00%) were paired; of these: 13836874 (46.60%) aligned concordantly 0 times 15535622 (52.32%) aligned concordantly exactly 1 time 319411 (1.08%) aligned concordantly >1 times

13836874 pairs aligned concordantly 0 times; of these:
  19384 (0.14%) aligned discordantly 1 time
----
13817490 pairs aligned 0 times concordantly or discordantly; of these:
  27634980 mates make up the pairs; of these:
    26189715 (94.77%) aligned 0 times
    1358482 (4.92%) aligned exactly 1 time
    86783 (0.31%) aligned >1 times

55.90% overall alignment rate`

default `29691907 reads; of these: 29691907 (100.00%) were paired; of these: 15713848 (52.92%) aligned concordantly 0 times 13723023 (46.22%) aligned concordantly exactly 1 time 255036 (0.86%) aligned concordantly >1 times

15713848 pairs aligned concordantly 0 times; of these:
  136412 (0.87%) aligned discordantly 1 time
----
15577436 pairs aligned 0 times concordantly or discordantly; of these:
  31154872 mates make up the pairs; of these:
    29552270 (94.86%) aligned 0 times
    1504759 (4.83%) aligned exactly 1 time
    97843 (0.31%) aligned >1 times

50.24% overall alignment rate`

My question then is, how do I proceed from here ? Do I use the default settings or use the --very-sensitive parameter. Still, from our lab, when a colleague changes this two parameters from default to very sensitive the overall mapping rates moves shoots 18% to 90%, which I find a bit off.