No spaces in output sequence name

olgabot commented 4 years ago

To make the output protein/dna sequences unique and indexable, they need to not have spaces where there is unique information, e.g. for the translation frame. This PR adds the translation frame to the sequence name, but without spaces.

E.g. here's a current output:

>read1/tr|A0A024R1R8|ENSP00000491117;mate1Start:1;mate2Start:1 translation_frame: 1

This would be changed to:

>read1/tr|A0A024R1R8|ENSP00000491117;mate1Start:1;mate2Start:1__translation-frame:1

Many thanks to contributing to czbiohub/sencha!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

[ ] This comment contains a description of changes (with reason)
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] Ensure the test suite passes with pytest . (command to run: pytest or make coverage if you want to see which lines don't have tests yet)
[ ] Make sure your code is linted and autoformatted using black (black . --check).
[ ] Documentation in usage.md is updated
[ ] README.md is updated

pranathivemuri commented 4 years ago

the tests are failing as the expected sequence names have spaces in them, otherwise this PR looks good

pranathivemuri commented 4 years ago

Closed in favor of #93

czbiohub-sf / orpheum

No spaces in output sequence name #92

PR checklist