DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

Clarification on HISAT2 default behavior regarding chrname #214

Closed MatteoSchiavinato closed 4 years ago

MatteoSchiavinato commented 4 years ago

Hello,

I only recently stumbled upon a situation whereby hisat2 adds "chr" in front of chromosome names. The input FASTA reference has chromosomes named as "1A, 2A, ..." and the output sam file has sequences named "chr1A, chr2A, ...".

I know that there are two options:

Which one is the default behavior?

Also, if my input FASTA chromosome names are already "chr1A, chr2A, ....", does --remove-chrname remove the "chr" prefix?

I'm working with hisat2 v2.1.0 in a pipeline, and I cannot predict what names the chromosomes will have when other people will use my program. At the moment, I'm using a function that reads their reference genome file and checks whether to add or not the --remove-chrname option. Wouldn't an option like --do-not-modify-chrnames be more clear to the user? Thanks for the clarification! :)

parkchanhee commented 4 years ago

Hi, @MatteoSchiavinato

HISAT2 doesn't change a chromosome name by default.

If you use --remove-chrname and the chromosome name starts with chr, then HISAT2 removes chr from the chromosome name. And, if you use --add-chrname and the chromosome name doesn't start with chr, HISAT2 adds chr before the name.

MatteoSchiavinato commented 4 years ago

I just re ran it with my index, my reads and then output sam file only, as parameters. And the sequence names in the output sam are chrA1 when the FASTA says only 'A1'.

Could it be related to the name of the sequences?

On Thu, 17 Oct 2019, 15:58 Chanhee Park, notifications@github.com wrote:

Hi, @MatteoSchiavinato https://github.com/MatteoSchiavinato

HISAT2 doesn't change a chromosome name by default.

If you use --remove-chrname and the chromosome name starts with chr, then HISAT2 removes chr from the chromosome name. And, if you use --add-chrname and the chromosome name doesn't start with chr, HISAT2 adds chr before the name.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DaehwanKimLab/hisat2/issues/214?email_source=notifications&email_token=AEEJZF3PDCJ47OA3MIJDRKTQPBVRDA5CNFSM4JBW53N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBQGHNQ#issuecomment-543187894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEJZFY53C62PJ547RNH2BDQPBVRDANCNFSM4JBW53NQ .

parkchanhee commented 4 years ago

Could you share the results of this commands, and the output sam file?

hisat2 --version hisat2-inspect -n <index>

MatteoSchiavinato commented 4 years ago

I'm from my phone so it's hard to copy paste the entire output but:

Version is 2.1.0.

Inspect returns the chromosomes with 'chr' in the front.

On Thu, 17 Oct 2019, 18:02 Chanhee Park, notifications@github.com wrote:

Could you share the results of this commands, and the output sam file?

hisat2 --version hisat2-inspect -n

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DaehwanKimLab/hisat2/issues/214?email_source=notifications&email_token=AEEJZF5DDQ5FLKOS6KRNFOTQPCD73A5CNFSM4JBW53N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBQUEMI#issuecomment-543244849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEJZF6UFMY66IJ22MKAGNTQPCD73ANCNFSM4JBW53NQ .

parkchanhee commented 4 years ago

The index file is built with the reference sequence with the chromosome name begins with chr. Could you check the chromosome names in the reference fasta file?

MatteoSchiavinato commented 4 years ago

I will close this issue since, after days of tests within our group, we couldn't sort out where did the problem arise. It seems more of a problem arising from the hisat2 called within a script, than from hisat2 itself. Thanks for being so responsive!