choderalab / fah-xchem

Tools and infrastructure for automated compound discovery using Folding@home
MIT License
6 stars 3 forks source link

Add updated tool to consolidate and preprocess data from simulation results #1

Closed mcwitt closed 4 years ago

mcwitt commented 4 years ago

Starting from run description in input JSON file:

    "1": {
        "JOBID": 1,
        "directory": "RUN0",
        "end": 0,
        "end_pIC50": "5.44251008457101",
        "end_smiles": "Cc1ccncc1NC(=O)Cc2cc(cc(c2)Cl)O[C@@H]3CC(=O)N3",
        "end_title": "TRY-UNI-2eddb1ff-7",
        "ff": "openff-1.2.0",
        "ligand": "nucleophilic_displacement_enumeration_for_FEP-sorted-x10789.sdf",
        "protein": "../receptors/monomer/Mpro-x2646_0_bound-protein-thiolate.pdb",
        "start": 2,
        "start_smiles": "Cc1ccncc1NC(=O)Cc2cc(cc(c2)Cl)OCS(=O)(=O)N[C@H]3C[C@H]4C[C@@H](C3)[NH2+]C4",
        "start_title": "EN300-784608",
        "target": "SARS-CoV-2 Mpro"
    },
  1. Extract work values from simulation data (globals.csv). See example on server at /home/server/server2/projects/available/covid-moonshot/consolidate-work-to-pandas-13420.py. Create work.json files with extracted data (at run/clone/gen level).

  2. Map free energy analysis over run/clone/gens. Produce the following values

    • delta_f
    • ddelta_f (BAR error estimate)
    • delta_f_low
    • delta_f_high
    • bar_overlap
    • number of samples (if we eliminate forward work, should we eliminate corresponding reverse?)
  3. Collect results into an augmented version of the input JSON, with the following additional schema:

    "1": {
        "complex_phase": {
              "delta_f" : -2.345,
              "ddelta_f" : 0.012,
              "delta_f_low" : -2.567, 
              "delta_f_high" : -2.1,
              "delta_f_bootstrap" : [-2.452, -2.23],
              "nwork" : 105,
              "bar_overlap" : 0.984,
           },
        "solvent_phase" : {
              "delta_f" : -2.345,
              "ddelta_f" : 0.012,
              "delta_f_low" : -2.567, 
              "delta_f_high" : -2.1,
              "nwork" : 105,
              "bar_overlap" : 0.984,
           },
        "binding" : {
              "delta_f" : -2.345,
              "ddelta_f" : 0.024,
        }
    },
jchodera commented 4 years ago

These fields are optional, and can come later:

I propose we also add core_rmsd as an optional metric to complex_phase where we can later add a step that computes the RMSD of the core atoms after protein alignment. A low number (e..g < 2.0A) would indicate the scaffold interactions are not disrupted, and can be used as additional quality control.

We can break these optional steps into separate issues to remind us to add these features later as quality control measures: