DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
473 stars 116 forks source link

hisat2-build were killed when attempting to index the Ginkgo genome #370

Open zoukai3412085 opened 2 years ago

zoukai3412085 commented 2 years ago

The Ginkgo genome is about 9.8GB, and I tried to build index in hisat2-build with the parameters --ss and --exon

however, two times when I run this task, it was killed without any alert, the running log is as follows:

image

My server is Ubuntu 18.04.5 LTS, has 48 CPUs and 512G RAM. I don't know why? Please help me!

xiaoyezao commented 2 years ago

I got the similar problem with Ginko genome:

Ran out of memory; automatically trying more memory-economical parameters.
....this warning repeats many times.....

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 61:11:20

I work on a remote server with 24 cores & 64G memory. It works well for other smaller genomes.

zoukai3412085 commented 2 years ago

I got the similar problem with Ginko genome:

Ran out of memory; automatically trying more memory-economical parameters.
....this warning repeats many times.....

Could not find approrpiate bmax/dcv settings for building this index.
Switching to a packed string representation.
Total time for call to driver() for forward index: 61:11:20

I work on a remote server with 24 cores & 64G memory. It works well for other smaller genomes.

I will also continue to try other parameters. If I encounter parameters that can be run, I will let you know. If you also find suitable parameters, I hope you can share them, thank you so much

xiaoyezao commented 2 years ago

I will also continue to try other parameters. If I encounter parameters that can be run, I will let you know. If you also find suitable parameters, I hope you can share them, thank you so much

Okay, let's share the index file if we get it through!

zoukai3412085 commented 2 years ago

I will also continue to try other parameters. If I encounter parameters that can be run, I will let you know. If you also find suitable parameters, I hope you can share them, thank you so much

Okay, let's share the index file if we get it through!

ok, I have tried to build index for the chromosome 1, it work well and cost less than 1 hour by 8 CPUs. The chr1 is ~1Gb.

zoukai3412085 commented 2 years ago

I will also continue to try other parameters. If I encounter parameters that can be run, I will let you know. If you also find suitable parameters, I hope you can share them, thank you so much

Okay, let's share the index file if we get it through!

I have an idea, I don't know if it is credible. The ginkgo genome is relatively large, most of which are non-coding regions. Is there a way to remove the non-coding regions and regenerate genome.fa and gff files? I am still looking for relevant methods.

I checked your homepage, and it seems that you are better at biological information. Do you have any suggestions for my idea? There are various indications that it is indeed caused by insufficient RAM.

xiaoyezao commented 2 years ago

If you are doing transcriptome analysis, you can simply use the CDS as reference to map your reads, which is what people do without a reference genome. But this might be not as accurate as reference-guide mapping.

I am asking a collaborator to help to index the genome on a large memory cluster. Let's see how it goes.

zoukai3412085 commented 2 years ago

If you are doing transcriptome analysis, you can simply use the CDS as reference to map your reads, which is what people do without a reference genome. But this might be not as accurate as reference-guide mapping.

I am asking a collaborator to help to index the genome on a large memory cluster. Let's see how it goes.

I tried with the following code: --noauto --bmaxdivn 1 --dcv 4096 but it still was killed, unfortunately. This has reached the limit of parameter adjustment.

xiaoyezao commented 2 years ago

I finally get this done by working on individual chromosomes. If you are interested, I can share more details

zoukai3412085 commented 2 years ago

I finally get this done by working on individual chromosomes. If you are interested, I can share more details

I did try a single chromosome before and it works well, too. But it kept failing while adjusting the parameters, so I've earned Tophat2 to solve this work. Thank very much for helping me. Best wishes!