WayScience / CytoSnake

Orchestrating high-dimensional cell morphology data processing pipelines
https://cytosnake.readthedocs.io
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

Removed Multi-processing and replaced it with `Snakemake's` native multiprocessing support #11

Closed axiomcura closed 2 years ago

axiomcura commented 2 years ago

This pull request focuses on replacing python's multiprocessing module with snakemake's native multiprocessing scheduler.

axiomcura commented 2 years ago

Hello @gwaybio

Apologizes for not explaining snakemakes multi-processing. In our previous version, I designed the rules to take in a list of input paths instead of individual paths.

It seems that the snakemake's multi-processing scheduler is "rule based". Meaning that the execution of the rules are parallelized.

If you look at the rules changes , I removed the expand() function, which creates a list of input file paths. The list of paths is then sent to the script as an input.

The removal of expand() allows snakemake to distribute all the input paths into the same rule as separate jobs allowing parallelization.

hopefully this makes sense. if not we an have a small discussion about it.

Also, I am reverting back to the scripts rule attribute https://github.com/WayScience/CytoPipe/pull/11#discussion_r916287016 . I initially thought using the script rule attribute prevented parallelization, which is not the case after more investigation and testing. This is an easy fix, but it requires removal of argparse and input formatting within the snakelike. This will be a separate pull request due to the amount of changes needed.

Also, I created an issue for the consensus.py script #12 . There is quite a lot of hard coded parameters in the script. We should discuss the possible defaults for this step so I can added into the consensus config file.

axiomcura commented 2 years ago

good to open the issue, but I don't think we should merge hardcoded variables into main. In other words, it's best to close https://github.com/WayScience/CytoPipe/issues/12 in this PR.

Does this call for renaming the PR that acknowledges both snakemake native multiprocessing and setting defaults for consensus?

gwaybio commented 2 years ago

Does this call for renaming the PR that acknowledges both snakemake native multiprocessing and setting defaults for consensus?

If you feel that appropriate, sounds good to me!

axiomcura commented 2 years ago

Hey @gwaybio !

I have edited the consensus script along with the configuration file implementation on this commit: https://github.com/WayScience/CytoPipe/pull/11/commits/ccb111e32e8ed3807125ba74e576391ce1dc6116

Let me know if the steps taken within the script is the correct way of producing a consensus profile!

axiomcura commented 2 years ago

Hello @gwaybio

Hopefully this take's care of it!

Removing cell health specific code: 91c11addac3f9c1e3211dee13c90ea6d28155d1f Replaced aggregate function with consensus function in the consensus.py script: 76a371380b4dc645152e193c16f2f7585f23c2fc reverted consensus output name to consensus.tsv: https://github.com/WayScience/CytoPipe/pull/11/commits/a547f52d4c0c0e093a3b331dab80b321e46a7d74