Closed axiomcura closed 2 years ago
Hello @gwaybio
Apologizes for not explaining snakemakes
multi-processing. In our previous version, I designed the rules to take in a list of input paths instead of individual paths.
It seems that the snakemake
's multi-processing scheduler is "rule based". Meaning that the execution of the rules are parallelized.
If you look at the rules changes , I removed the expand()
function, which creates a list of input file paths. The list of paths is then sent to the script as an input.
The removal of expand()
allows snakemake
to distribute all the input paths into the same rule as separate jobs allowing parallelization.
hopefully this makes sense. if not we an have a small discussion about it.
Also, I am reverting back to the scripts
rule attribute https://github.com/WayScience/CytoPipe/pull/11#discussion_r916287016 . I initially thought using the script
rule attribute prevented parallelization, which is not the case after more investigation and testing. This is an easy fix, but it requires removal of argparse
and input formatting within the snakelike. This will be a separate pull request due to the amount of changes needed.
Also, I created an issue for the consensus.py
script #12 . There is quite a lot of hard coded parameters in the script. We should discuss the possible defaults for this step so I can added into the consensus config file.
good to open the issue, but I don't think we should merge hardcoded variables into main. In other words, it's best to close https://github.com/WayScience/CytoPipe/issues/12 in this PR.
Does this call for renaming the PR that acknowledges both snakemake
native multiprocessing and setting defaults for consensus?
Does this call for renaming the PR that acknowledges both snakemake native multiprocessing and setting defaults for consensus?
If you feel that appropriate, sounds good to me!
Hey @gwaybio !
I have edited the consensus script along with the configuration file implementation on this commit: https://github.com/WayScience/CytoPipe/pull/11/commits/ccb111e32e8ed3807125ba74e576391ce1dc6116
Let me know if the steps taken within the script is the correct way of producing a consensus profile!
Hello @gwaybio
Hopefully this take's care of it!
Removing cell health specific code: 91c11addac3f9c1e3211dee13c90ea6d28155d1f
Replaced aggregate function with consensus function in the consensus.py
script: 76a371380b4dc645152e193c16f2f7585f23c2fc
reverted consensus output name to consensus.tsv
: https://github.com/WayScience/CytoPipe/pull/11/commits/a547f52d4c0c0e093a3b331dab80b321e46a7d74
This pull request focuses on replacing python's
multiprocessing
module withsnakemake
's native multiprocessing scheduler.