czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

option to output translate coding scores as parquet #75

Closed bluegenes closed 4 years ago

bluegenes commented 4 years ago

Coding score CSVs can get quite large! This PR adds the option to output a parquet from sencha translate instead. It doesn't explicitly disable csv as an option, so theoretically if desired, a user could output both.

I chose to do it this way bc I would prefer to use different file extensions (.csv, .parquet) for the output, and I think it's a little confusing because the option is explicitly --csv. I suppose this could be better handled by using a --coding-scores output + --csv and --parquet flags, but that would break current pipelines.

Closes: #68

PR checklist