run_structure_prediction.py accepts comma separated list of input folds and optionally dedicated output_directories for each fold

KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749

GNU General Public License v3.0

199 stars 46 forks source link

run_structure_prediction.py accepts comma separated list of input folds and optionally dedicated output_directories for each fold #357

Closed maurerv closed 3 months ago

dingquanyu commented 3 months ago

I guess in the case of padding, you may also need to update the --output_directory key so that its value is a list in the argument dictionary by extending it to all the sub-folders that should be created in this if block here? e.g. iterate through all_folds and append individual path.join(FLAGD.output_path, <name of the protein complex>) to a list. https://github.com/KosinskiLab/AlphaPulldown/blob/732baec9a47d3e02658975078ac29a8fbab66a68/alphapulldown/scripts/run_multimer_jobs.py#L125

maurerv commented 3 months ago

@dingquanyu from your PR at https://github.com/KosinskiLab/AlphaPulldownSnakemake/pull/13 it seemed like you wanted run_multimer_jobs.py to use a single output directory and create subdirectories for each fold according to use_ap_style.

We could extend run_multimer_jobs.py to allow multiple output_paths, but since run_multimer_jobs.py uses the file-based fold specification, where the user might not know the number of folds beforehand, I think having a single output directory makes the most sense

dingquanyu commented 3 months ago

@dingquanyu from your PR at KosinskiLab/AlphaPulldownSnakemake#13 it seemed like you wanted run_multimer_jobs.py to use a single output directory and create subdirectories for each fold according to use_ap_style.

We could extend run_multimer_jobs.py to allow multiple output_paths, but since run_multimer_jobs.py uses the file-based fold specification, where the user might not know the number of folds beforehand, I think having a single output directory makes the most sense

I see. This means in the snakemake pipeline, you will bypass run_multimer_jobs.py and launch run_structure_prediction.py directly with a cluster of jobs ?

maurerv commented 3 months ago

Exactly. I added a checkpoint that performs the clustering and then extended the current rule using run_structure_prediction.py to run on each cluster separately. This way we don't need additional rules.

I just pushed these changes for reference bfa71c7ac5d013a0c1aea3b78fc347381a3ca06c

dingquanyu commented 3 months ago

Exactly. I added a checkpoint that performs the clustering and then extended the current rule using run_structure_prediction.py to run on each cluster separately. This way we don't need additional rules.

I just pushed these changes for reference bfa71c7ac5d013a0c1aea3b78fc347381a3ca06c

I see. Thanks for the commit. Now it makes sense to me.