Closed shuaiwang2 closed 1 year ago
@shuaiwang2 Hi, we don't currently support indexes that long. We use a bai index for bams and tabix for vcf which only support up to 512 M. You need to use a CSI index for references that large but we don't support writing those. (Reading them is weird, I think we can read BAM csi indexes but not VCF ones).
It might be possible to work around this issue by setting --create-output-variant-index false
, although downstream gatk tools would need an index if you're sharding them.
Otherwise I recommend splitting your chromosomes into two separate parts and calling on the split chromosomes. Splitting along a long region of N's should be a safe way to avoid missing any useful calls. (The telemere might be a good spot unless you have a T2T reference.).
We should probably improve that error message to make it clear what the problem is.
thank you, very useful advice for me
I've opened a ticket (https://github.com/samtools/htsjdk/issues/1651) to improve this error message and make it less confusing
Hello, When I implement "HaplotypeCaller" commands, the reference genome is about 15G , every chromosome is more then 600M, I get some errors, could you give me some advice? the commands
the bug: