Improvements for automated benchmarking

jchodera commented 2 years ago

There appear to be some issues with how the run_benchmarks.py script is written: If each script is supposed to handle a transformation, which may cause issues if this code runs concurrently for multiple transformations:

target_dir = targets_dict[target]['dir']
pdb_url = f"{base_repo_url}/raw/master/data/{target_dir}/01_protein/crd/protein.pdb"
pdb_file = retrieve_file_url(pdb_url)

# Fetch cofactors crystalwater pdb file                                                                                                                                                                                                        
# TODO: This part should be done using plbenchmarks API - once there is a conda pkg                                                                                                                                                            
cofactors_url = f"{base_repo_url}/raw/master/data/{target_dir}/01_protein/crd/cofactors_crystalwater.pdb"
cofactors_file = retrieve_file_url(cofactors_url)

# Concatenate protein with cofactors pdbs                                                                                                                                                                                                      
concatenate_files((pdb_file, cofactors_file), 'target.pdb')

# Fetch ligands sdf files and concatenate them in one                                                                                                                                                                                          
# TODO: This part should be done using plbenchmarks API - once there is a conda pkg                                                                                                                                                            
ligands_url = f"{base_repo_url}/raw/master/data/{target_dir}/00_data/ligands.yml"
with fetch_url_contents(ligands_url) as response:
    ligands_dict = yaml.safe_load(response.read())
ligand_files = []
for ligand in ligands_dict.keys():
    ligand_url = f"{base_repo_url}/raw/master/data/{target_dir}/02_ligands/{ligand}/crd/{ligand}.sdf"
    ligand_file = retrieve_file_url(ligand_url)
    ligand_files.append(ligand_file)
# concatenate sdfs                                                                                                                                                                                                                             
concatenate_files(ligand_files, 'ligands.sdf')

Presumably, we want to break this into multiple stages, or find some way to appropriately construct and execute the dependency graph on the cluster:

retrieve files needed for one or more benchmark system(s)
set up all transformations in parallel
run or resume all transformations in parallel
analyze data to generate plots

Alternatively, we can make sure that every transformation acts fully independently until the analysis stage at the end using diffnet.

We'll want to find a more clever way to refactor this in the next release so we can support benchmarking multiple targets at the same time as well.

ijpulidos commented 2 years ago

Yes, considering benchmarking is becoming bigger we do want to have better (more efficient and friendly) and smarter ways to deal with these situations. I wonder if we want to have a whole new benchmarking module for perses, considering how this is evolving.

ijpulidos commented 2 years ago

Seems like it is becoming even more beneficial to have a whole module for the benchmarks part of perses https://github.com/choderalab/perses/pull/1050#discussion_r903025368

jchodera commented 2 years ago

Is this for performance benchmarking (ns/day, time for a single calculation) or accuracy benchmarking?

ijpulidos commented 2 years ago

Is this for performance benchmarking (ns/day, time for a single calculation) or accuracy benchmarking?

This is for accuracy benchmarking. That is, running the systems in the protein-ligand-benchmark dataset and checking the plots and errors.

choderalab / perses

Improvements for automated benchmarking #927