czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

Revert multiprocessing #93

Closed pranathivemuri closed 4 years ago

pranathivemuri commented 4 years ago

Many thanks to contributing to czbiohub/sencha!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

pranathivemuri commented 4 years ago

todo 1. fix tests 2. lint

@olgabot here is the branch that helped completed sencha translate on no_k2g_uniport in 16 hours. running k2g_uniprot now

pranathivemuri commented 4 years ago

@olgabot ready for review! would be great if @lekhakaranam @phoenixAja can try this code on their kmermaid pipeline on mouse and bat data once its merged to master or on this branch after review! I want to see if there any bugs needs to be addressed.

Note: I am reverting multiprocessing part of sencha translate as it was causing ram overtaking on lrrr - we were writing several big files on multi processes to combine them later which was causing the memory fill up but it was to avoid overwriting file at the same time resulting in race condition).

I kept the part where I converted the code to gather the list and write csv file not using pandas and using sys.stdout.flush() - without this predict-orthologs pipeline would hang up and do not progress