I'm currently working with a contig named "160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular)". I was able to align reads to this contig and perform r-mutation calling in it; however, when I get to the part of strainFlye call r-mutation that attempts to convert the resulting VCF file to a BCF file, I get the following error:
...
call r-mutation @ 1,989.53s: Converting the VCF file to a compressed BCF file...
[E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular),length=388097>"
...
I'm pretty sure the commas and parentheses are the kicker here. The VCF v4.3 spec confirms this:
Contig names follow the same rules as the SAM format’s reference sequence names: they may contain any printable
ASCII characters in the range [!-~] apart from ‘\ , "‘’ () [] {} <>’ and may not start with ‘*’ or ‘=’. Thus they
match the following regular expression:
In particular, excluding commas facilitates parsing ##contig lines, and excluding the characters ‘<>[]’ and initial ‘*’
avoids clashes with symbolic alleles. The contig names must not use a reserved symbolic allele name.
The obvious solution here is thus updating strainFlye's FASTA-reading function (?) to reject badly-named contigs like this. We could maybe make this check only required for strainFlye commands that will use bcftools or something, but it's less of a headache (and probably good practice) to just apply this check to every time we read a FASTA file.
This is a similar idea to #57 -- however, addressing that issue in its entirety would require some extra work. (For example: The regex given by the VCF spec apparently allows contigs with / characters in their names, but I have to imagine that if your contigs actually have / in their names then a swarm of locusts will devour your terminal or something when you start trying to use these names in files.)
I'm currently working with a contig named "
160+,38-,88-,106-,51+,23+,92-,34+,158-,167+,41+,131+,81+,58+,102+,192-(circular)
". I was able to align reads to this contig and perform r-mutation calling in it; however, when I get to the part ofstrainFlye call r-mutation
that attempts to convert the resulting VCF file to a BCF file, I get the following error:I'm pretty sure the commas and parentheses are the kicker here. The VCF v4.3 spec confirms this:
The obvious solution here is thus updating strainFlye's FASTA-reading function (?) to reject badly-named contigs like this. We could maybe make this check only required for strainFlye commands that will use bcftools or something, but it's less of a headache (and probably good practice) to just apply this check to every time we read a FASTA file.
This is a similar idea to #57 -- however, addressing that issue in its entirety would require some extra work. (For example: The regex given by the VCF spec apparently allows contigs with
/
characters in their names, but I have to imagine that if your contigs actually have/
in their names then a swarm of locusts will devour your terminal or something when you start trying to use these names in files.)