ComparativeGenomicsToolkit / hal2vg

Convert HAL to VG
MIT License
21 stars 2 forks source link

halUnclip #49

Closed glennhickey closed 3 years ago

glennhickey commented 3 years ago

halUnclip is a tool to add sequence back to hal files in the event that they were clipped out via cactus-preprocess.

It transforms a set of sequence fragments of the form, ex, chr1_sub_1_10, chr1_sub_500_600, chr1_sub_1000_20000 and a fasta sequence of the entire chr1 and makes an output with just chr1 in it.

All alignments are preserved, it's just the names and sequence strings that are changed. New sequence added is by definition unaligned to anything.

glennhickey commented 3 years ago

This seems somewhat usable, finally. 7 hours / 31G RAM for 90-way human pangenome. Somewhat ironically, the biggest potential speedup I see is fixing sonLib's fasta parser to not be so slow.