halUnclip is a tool to add sequence back to hal files in the event that they were clipped out via cactus-preprocess.
It transforms a set of sequence fragments of the form, ex, chr1_sub_1_10, chr1_sub_500_600, chr1_sub_1000_20000 and a fasta sequence of the entire chr1 and makes an output with just chr1 in it.
All alignments are preserved, it's just the names and sequence strings that are changed. New sequence added is by definition unaligned to anything.
This seems somewhat usable, finally. 7 hours / 31G RAM for 90-way human pangenome. Somewhat ironically, the biggest potential speedup I see is fixing sonLib's fasta parser to not be so slow.
halUnclip
is a tool to add sequence back to hal files in the event that they were clipped out viacactus-preprocess
.It transforms a set of sequence fragments of the form, ex,
chr1_sub_1_10
,chr1_sub_500_600
,chr1_sub_1000_20000
and a fasta sequence of the entirechr1
and makes an output with justchr1
in it.All alignments are preserved, it's just the names and sequence strings that are changed. New sequence added is by definition unaligned to anything.