Closed pranathivemuri closed 4 years ago
(sencha) ➜ sencha git:(master) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
// Without parallel processing
22it [00:00, 5273.44it/s]
2662it [00:04, 533.92it/s]
time taken is 5.2929160594940186 seconds
(sencha) ➜ sencha git:(master) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 6247.86it/s]
2662it [00:05, 525.78it/s]
time taken is 5.3656840324401855 seconds
(sencha) ➜ sencha git:(master) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 6183.39it/s]
2662it [00:04, 538.34it/s]
time taken is 5.239797830581665 seconds
(sencha) ➜ sencha git:(master) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 5382.96it/s]
2662it [00:04, 540.03it/s]
time taken is 5.250756740570068 seconds
(sencha) ➜ sencha git:(master) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 5457.78it/s]
2662it [00:04, 534.72it/s]
time taken is 5.285132884979248 seconds
# Parallel processing
(sencha) ➜ sencha git:(pranathi-translate) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 5962.05it/s]
time taken to translate is 3.96411 seconds
(sencha) ➜ sencha git:(pranathi-translate) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 5541.36it/s]
time taken to translate is 3.90832 seconds
(sencha) ➜ sencha git:(pranathi-translate) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 6440.17it/s]
time taken to translate is 3.90263 seconds
(sencha) ➜ sencha git:(pranathi-translate) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 6106.46it/s]
time taken to translate is 3.87282 seconds
(sencha) ➜ sencha git:(pranathi-translate) ✗ sencha translate tests/data/SRR306838_GSM752691_hsa_br_F_1_trimmed_subsampled_n22.fq.gz tests/data/index/Homo_sapiens.GRCh38.pep.subset.fa.gz --save-peptide-bloom-filter --verbose
22it [00:00, 6410.19it/s]
time taken to translate is 3.92853 seconds
a 33% decrease in time, maybe will help. I am not sure about the recent memory error while writing the fasta sequence to the file (if it can be fixed via this PR) while writing the sequence though.
Wow this is working!?!?? Amazing work!
yes! just had to declare a global variable for the node graph instead of it being a class object so it doesn't get serialized as an attribute of the class. Tried it on human makefile but unfortunately having errors with sambamba dedup timing out - going to rerun it today, probably will just try one bam file at a time if that doesn't work
@olgabot let me know if you want to merge or wait for any of the big pipeline runs to finish this branch.
This is awesome! Let's make sure at least one of the pipelines finishes successfully before merging it in, in case there's some edge cases we run into with really big files.
@pranathivemuri @olga mentioned that you might have a docker container for nf-predictorthologs with this new change added? I would be happy to try it out on the bat data!
@lekhakaranam @phoenixAja here are the multiprocess related changes added in dockerfile and main.nf - https://github.com/czbiohub/nf-predictorthologs/pull/78/files (note both must be changed) the container is on ndnd
Many thanks to contributing to czbiohub/sencha!
Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).
PR checklist
pytest
ormake coverage
if you want to see which lines don't have tests yet)black . --check
).usage.md
is updatedREADME.md
is updated