Because some MAF tools use . as a special delimiter to separate species and contig names, care must be taken to make sure that species names themselves don't contain .s (which they often due now that we're using accessions) when going through some MAF and TAF tools.
I've got a case now that seems weirdly slow, though:
2283678 hickey 20 0 19256 17680 1896 R 76.3 0.0 38:34.11 sed -f alnum_to_genome.sed
2281905 hickey 20 0 19256 17668 1896 R 74.6 0.0 47:37.07 sed -f alnum_to_genome.sed
2286745 hickey 20 0 44860 43308 1944 R 71.2 0.0 11:46.78 sed -f genome_to_alnum.sed
No idea why sed is so slow, but it's doing this because the names have underscores when there really is no need. So this just lets underscores go through.
This logic probably needs revising in the future, since it seems like a real dumb potential bottleneck.
Because some MAF tools use
.
as a special delimiter to separate species and contig names, care must be taken to make sure that species names themselves don't contain.
s (which they often due now that we're using accessions) when going through some MAF and TAF tools.I've got a case now that seems weirdly slow, though:
No idea why sed is so slow, but it's doing this because the names have underscores when there really is no need. So this just lets underscores go through.
This logic probably needs revising in the future, since it seems like a real dumb potential bottleneck.