bigginlab / ABFE_workflow

This is a SnakeMake based workflow for ABFE calculations that can be easily scaled in a high-throughput manner via Slurm for example.
GNU General Public License v3.0
44 stars 16 forks source link

Setting up the workflow #5

Closed ale94mleon closed 1 year ago

ale94mleon commented 1 year ago

Hi, I am trying to set up the workflow in my cluster. The first issue was how to create the conda environment. Now it is working with the following environment.yml:

name: abfe
channels:
  - conda-forge
dependencies:
  - python=3.7.12
  - pip
  - conda-build
  - gromacs=2022.4
  - parmed
  - bioconda::snakemake=7.8.5
  - pip:
    - alchemlyb==0.6.0
    - pymbar==3.0.5
    - matplotlib
    - mdanalysis
    - numpy
    - pandas
    - scipy

However, when the environment is activated, I am getting the warning

WARNING: No ICDs were found. Either,
- Install a conda package providing a OpenCL implementation (pocl, oclgrind, intel-compute-runtime, beignet) or 
- Make your system-wide implementation visible by installing ocl-icd-system conda package. 

For the calc_ABFE.py

#!/usr/bin/env python3

import os
from abfe.orchestration.build_and_run_ligands import calculate_all_ligands

if __name__ == "__main__":
    orig_dir = os.getcwd()

    # IO:
    out_root_path = "./data/"
    in_root_path = "./data/input/system1"

    input_ligand_paths = [in_root_path+"/"+dir for dir in os.listdir(in_root_path) if(os.path.isdir(in_root_path+"/"+dir))]
    print("input ligand dirs: ", input_ligand_paths)
    print("output root dir: ", out_root_path)

    # Options:
    n_cores=1
    num_jobs = 40
    num_replicas=1
    submit=True

    cluster_config ={
        "partition": "deflt",
        "time": "48:00:00",
        "num_sim_threads":8,
        "mem": "20GB",
    }

    # Do Fun!
    if(not os.path.isdir(out_root_path)): os.mkdir(out_root_path)
    calculate_all_ligands(input_ligand_paths=input_ligand_paths, out_root_path=out_root_path,  num_max_thread = 8,
                           num_replicas=num_replicas, submit=submit, num_jobs=num_jobs, cluster_config=cluster_config)

    os.chdir(orig_dir)

I changed in_root_path = "/data/input/system1", the partition and the keyword n_cores=n_cores. The last is not valid.

However, with this script I am getting:

input ligand dirs:  ['./data/input/system1/ligand1']
output root dir:  ./data/
./data//ligand1/1/job.sh ./data//ligand1/1/scheduler.sh
Traceback (most recent call last):
  File "calc_ABFE.py", line 34, in <module>
    num_replicas=num_replicas, submit=submit, num_jobs=num_jobs, cluster_config=cluster_config)
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/build_and_run_ligands.py", line 55, in calculate_all_ligands
    num_replicas=num_replicas, cluster_config=cluster_config, submit=submit, num_jobs=num_jobs)
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/build_and_run_ligands.py", line 45, in build_run_ligand
    out = scheduler.schedule_run()
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/generate_scheduler.py", line 81, in schedule_run
    job_id = int(out.split()[-1])
ValueError: invalid literal for int() with base 10: 'directory'

Now, I actually do not know how to continue. Could you please help me?

ale94mleon commented 1 year ago

Hi, I solved the first problems. The issue was related during handling of relative paths and some minor issues on the method generate_scheduler_file fo the class orchestration.generate_scheduler.scheduler that it was using always the cpu partition. Now the problems looks like are related with the definition of the Snakemake rules. I am getting in slrum-{jobid}.out:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 40
Traceback (most recent call last):
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/snakemake/__init__.py", line 810, in snakemake
    keepincomplete=keep_incomplete,
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/snakemake/workflow.py", line 1085, in execute
    logger.run_info("\n".join(dag.stats()))
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/snakemake/dag.py", line 2383, in stats
    yield tabulate(rows, headers="keys")
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/tabulate/__init__.py", line 2049, in tabulate
    tabular_data, headers, showindex=showindex
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/tabulate/__init__.py", line 1471, in _normalize_tabular_data
    rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/tabulate/__init__.py", line 1471, in <lambda>
    rows = list(map(lambda r: r if _is_separating_line(r) else list(r), rows))
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/tabulate/__init__.py", line 107, in _is_separating_line
    (len(row) >= 1 and row[0] == SEPARATING_LINE)
  File "/home/users/all-jh/opt/miniconda3/envs/abfe/lib/python3.7/site-packages/snakemake/rules.py", line 1216, in __eq__
    return self.name == other.name and self.output == other.output
AttributeError: 'str' object has no attribute 'name'

Now my question is, do you know what could be the problem? Do you know how can I debug here?

In addition, I was going through the templates and I see that at any moment the conda environment is activated. Also, on my cluster we have first to source source /the/path/to/opt/miniconda3/etc/profile.d/conda.sh in order to be able to use conda activate abfe and the nodes can work with the environment. Best, Alejandro

ale94mleon commented 1 year ago

The last issue AttributeError: 'str' object has no attribute 'name' is solved updating snakemake conda update snakemake -c bioconda. So, maybe in the specifications of the environment is better pin to 7.24.0 or just do not add any specification at all. Now I have a whole zoo of errors related with the MD part. Let see...

IAlibay commented 1 year ago

@ale94mleon apologies it's the weekend so I am unlikely to look at this until Monday at best. I know that @RiesBen was updating things in his latest PR, so I'm not sure if the issues mentioned here are being addressed.

RiesBen commented 1 year ago

@ale94mleon sorry for the inconvenience! The #4 will change the repo structure and make it much easier to use.

ale94mleon commented 1 year ago

Hi @RiesBen and @IAlibay, that sounds great! Then I am going to wait and give it another try after the merge.

ale94mleon commented 1 year ago

Hi, I tested yesterday and the workflow is still crashing, at least the system building part is working now. Some problems that I detected were related with the definition and use of the cluster configuration. For example the class abfe.orchestration.generate_scheduler.scheduler pins the partition to cpu no matter if cluster_config has a different defined partition (some of its methods). Other issue on this class is related with " -c " + str(self.n_cores) section, that goes out of the available resource of my nodes; this is a problem on my end, but I am still struggling in how to solve it with the definition of the arguments: n_cores_per_job, num_jobs_receptor_workflow, num_jobs_per_ligand and num_replicas. It is not so clear to me yet.

In my opinion, it should be dynamically added the necessary information from cluster_config every time that a slurm command is build. In this way it will be easy to ask for specific resources on the cluster (e.g GPU).

Respect to #11, one way is asking on the CLI mdrun_extra_keywords. And add them every time that gmx mdrun is called.

For now a GROMACS conda installation is used. But that could be an issue in case different HPC architecture for which an specific building strategy is needed for GROMACS. Here, maybe is useful to add some user-intereacted-keyword like build_gmx_section, that could be used to specify how GROMACS should be load and from where, for example loading from Spack:

source /data/shared/spack-0.19.1/shared.bash
module load gromacs/2022.5

then, that could be added to the sbatch commands.

Respect to the installation issues, now it is working better with a pinned environment (check here)

RiesBen commented 1 year ago

Hi @ale94mleon, what's your current status, there were some updates to the code here. soon we will have a first release.

RiesBen commented 1 year ago

closed this due to long time ago. a lot of changes.