Closed glennhickey closed 7 months ago
In light of this comment which shows the crash happens even if the colon isn't denoting a subrange, I added a global conversion of :
to _
for contigs coming into cactus_pangenome
. The subrange logic mentioned above is still applied, but any stray colons beyond that are now turned to _
.
cactus-pangenome
will crash if any of the input fasta contigs have a suffix like:10-100
denoting a subpath range (a fairly standard annotation that, ex,samtools faidx
uses -- note it's 1-based end inclusive). This seems to be because these types of subranges are already being used in GAF path steps coming out of minigraph, which confuses a parsing step in Cactus.Not related, but there is already logic to convert subranges to
_sub_9_100
(0-based exclusive end) going into pangenome HALs because the genome browser can't (couldn't?) handle:
and/or-
.This PR just bumps this existing logic forward, paths of the form
chr1:10-100
will get converted right away tochr1_sub_9_100
incactus_sanitizeFastaHeaders
-- the end result being 1) the crash is fixed while 2) the subrange information is correctly preserved through to the output.resolves #1287