DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

"local graph exploded" warning when running hisat2-build with snp and transcriptome? #267

Open VictorZheng1010 opened 3 years ago

VictorZheng1010 commented 3 years ago

Hi, developer

I got some warning messages when using hisat2-build (v2.2.1) to generate index files of mouse mm10 genome with UCSC snp142 and gene annotation information.

My command was: nohup hisat2-build -p 30 --snp mm10_snp142.snp --haplotype mm10_snp142.haplotype --ss mm10_splicesites.txt --exon mm10_exons.txt mm10.fa mm10_snp_tran > run_build_snp_tran.log & In the log file, there're some warning messages:

Returning from GFM constructor Warning: a local graph exploded (offset: 19690239, length: 57344) Warning: a local graph exploded (offset: 22055679, length: 57344) Warning: a local graph exploded (offset: 23576319, length: 57344) Warning: a local graph exploded (offset: 24927999, length: 57344) Warning: a local graph exploded (offset: 25829119, length: 57344) Warning: a local graph exploded (offset: 48300799, length: 57344) Warning: a local graph exploded (offset: 73518429, length: 57344) ......

I wanna ask if these "local graph exploded" matter when running down-stream hisat2 alignment with these index files?

Thanks, WSZ

yanpd01 commented 3 years ago

Have you solved this problem yet? I have the same problem too.

evasehr commented 2 years ago

Would also love to hear about these warnings. Occurred to me too.

Sherry520 commented 2 years ago

Would also love to hear about these warnings. Occurred to me too.

mpitz123 commented 2 years ago

I have the same Problem, trying to build an index for the B73 v5 maize genome. The commands and input file sources are in the file attached. Is there a mistake somewhere? And can I use the index files for my alignment? Hisat2_build_index.txt

bnorthoff commented 1 year ago

I have the same problem using version 2.2.1. Is this an error?

c-nabokov commented 1 year ago

Occurred to me too. I think this is an error that must be solved, because it will lead to a significant reduction in the number of reads that are uniquely and correctly matched to transcriptome data.

VetAshish commented 1 year ago

Returning from GFM constructor Warning: a local graph exploded (offset: 697718624, length: 57244) Warning: a local graph exploded (offset: 697718624, length: 57244)

facing same problem

Giotto187 commented 1 year ago

I had the same problem building the latest version of genome_snp_tran index for grch38. Did anyone solve it?

SJJHK commented 1 year ago

Hello,

The aforementioned warnings of exploding graphs and a length of 57344 is also the case for us (Hisat2 ver 2.2.1). Please see the attached output.

This was the code utilised:

hisat2-build --exon V41_extractexon --ss V41_extractsplice --snp genome.snp --haplotype genome.haplotype -f Gencode_V41_Comp_ERCC_Merge_Genome.fa snp_haplotype_test

After "Generation 22", the warnings appear.

We utilise the Human Release of Gencode v41 Comprehensive (GRCh38.p13).

Is it safe to say a "warning" is a warning and not an "error", as the build continues through to completion?

We are able to align a sample file and view the .bam. Additionally, we compared it to the same sample file aligned using a standard index build utilising a novel splice site infile during alignment. We saw a 1% reduction in alignment rate when comparing these 2. Albeit, 2 files is not a comparison.

Two other questions:

  1. In the initial stages of the build output it notes that the Local Sequence Length is 57344. Is this a coincidence that the exploding graph length notes the same length?
  2. In the initial stages after it identifies the input file fasta, reading the reference sizes etc, a line stating time to read snps and splice site appears with the time 1:29, however no mention of the haplotype file occurs. Is this an issue? The "--haplotype" section of the manual states " See the above option, –snp, about how to extract haplotypes. This option is not required, but haplotype information can keep the index construction from exploding and reduce the index size substantially."

Thanks in advance for replying to the issue or "non-issue"?

hisat2-build-log-jan24-2023.txt