czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

Parallelize "translate" #83

Closed olgabot closed 3 years ago

olgabot commented 4 years ago

sencha translate currently happens in serial, but since every read is independent of one another, then each read could be translated separately. The function would need to yield or return the rows for the summary CSV, in addition to writing the protein/nucleotide fastas toa file. Maybe each process could write to a temporary file, then the main/pooled process could concatenate the fastas? As long as the coding_peptides.fasta and coding_nucleotides.fasta have their read IDs in the same order then I think that works!

pranathivemuri commented 4 years ago

https://github.com/czbiohub/sencha/pull/84/files

pranathivemuri commented 3 years ago

this wasn't such a good idea with the number of cores for each small file when used with nextflow. for small files it is much better to have serial but use a simple map function on the list which PR #93 is doing. so closing this issue