Closed mlin closed 3 years ago
Thanks @tfrcarvalho -- ideally yea, but I think it's a small risk probably not worth spending the time to re-engineer anytime soon. Pipe isn't too unusual of a character to see used as a delimited "in the wild" (NCBI uses it a lot for example). Famous last words, but we'll probably be okay with backtick ;)
LOL, and 5 minutes later I remembered that backtick can appear in the FASTQ quality scores for older Illumina data. According to the diagram there, even pipe can appear there for PacBio data!
Maybe we should fix this properly after all, or see if it'll work with a nonprintable character. Reverted for now
Change intermediate delimiter used by
util.fastq.sort_fastx_by_entry_id()
from | to ` in hopes the latter is even less likely to arise in read names, causing the output to be mangled. (I'm working with some CAMI challenge datasets that use pipes in their read names)