LiuLabUB / HMMRATAC

HMMRATAC peak caller for ATAC-seq data
GNU General Public License v3.0
98 stars 23 forks source link

BR: HMMRATAC fails to run on large genomes needing .csi index #96

Open TeiturAK opened 2 years ago

TeiturAK commented 2 years ago

Describe the bug I'm running HMMRATAC on several plant genomes which vary greatly in size. HMMRATAC fails when running on the largest genomes that require a .csi index. It produces the following error:

Exception in thread "main" java.lang.RuntimeException: Invalid file header in BAM index spruce.sorted.unique_mapped.MT_CP_removed.bam.csi: ^_^D

It works fine on the smaller genomes for which I can generate a .bai index.

System:

Additional context I'm working with a ~20GB genome that requires a .csi index. I did not use multithreading when creating the index and just changing the name of the index to have a .bai ending does not help.

Any help would be much appreciated. Teitur

Mouwrice commented 2 years ago

@TeiturAK I see no reference of HMMRATAC being able to process .csi files. Why did you think this should work?

Mikxox commented 2 years ago

The internal samtools dependency is using a reader implementation that has been deprecated for years and does not seem to support .csi index files. The dependency should be update to the latest release and the code refactored to use the new reader implementation.

jitsedesmet commented 2 years ago

@TeiturAK is it possible to share a .csi index file? I would like to implement this feature/ fic the bug and test whether it works. I'm but a noble computer science student and have no idea where to find a .csi file to test this feature. After this project is done I will contact you again so I can share the implementation of course :slightly_smiling_face: .

jitsedesmet commented 2 years ago

I have been able to generate a csi file myself and my implementation seems to work for me. I will link my implementation once it can be made public here. You are still welcome to provide me with your data so I can make sure it works. (Although I understand that sharing data in some fields is not trivial in which case I hope it'll work for you) :smile: