czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

No spaces in output sequence name #92

Closed olgabot closed 4 years ago

olgabot commented 4 years ago

To make the output protein/dna sequences unique and indexable, they need to not have spaces where there is unique information, e.g. for the translation frame. This PR adds the translation frame to the sequence name, but without spaces.

E.g. here's a current output:

>read1/tr|A0A024R1R8|ENSP00000491117;mate1Start:1;mate2Start:1 translation_frame: 1 

This would be changed to:

>read1/tr|A0A024R1R8|ENSP00000491117;mate1Start:1;mate2Start:1__translation-frame:1 

Many thanks to contributing to czbiohub/sencha!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

pranathivemuri commented 4 years ago

the tests are failing as the expected sequence names have spaces in them, otherwise this PR looks good

pranathivemuri commented 4 years ago

Closed in favor of #93