choderalab / fah-xchem

Tools and infrastructure for automated compound discovery using Folding@home
MIT License
6 stars 3 forks source link

Boatload of patches required for Sprint 11 #164

Closed jchodera closed 2 years ago

jchodera commented 2 years ago

Description

This is a boatload of patches/fixes/hacks needed to get Sprint 11 running in production on mskcc1.foldingathome.org.

Summary of changes

Useful things to carry forward

Horrible bodges

Todos

Things left to fix:

Status

codecov-commenter commented 2 years ago

Codecov Report

Merging #164 (b91c8a6) into master (9930c06) will decrease coverage by 3.41%. The diff coverage is 10.46%.

jchodera commented 2 years ago

There's a major flaw in the snapshot extraction system that will require a significant rewrite.

In brief, our current scheme does the following:

  1. construct an MDTraj Trajectory for the hybrid system
  2. aligns it to the appropriate reference frame
  3. slices out the appropriate snapshot frame
  4. slices out atom subsets for {old|new} {complex|protein|ligand}
  5. writes these out in PDB and SDF formats (using a flaky PDB -> SDF conversion heuristic)

The problem is that atoms that change identity during the transformation are left in their initial state in the hybrid topology used to create the Trajectory. We therefore either need to do this a completely different way, or we need some way to construct a "new hybrid topology" to be used before step 4 or to substitute the appropriate new topology.

We also likely need to substitute out the flaky PDB -> SDF conversion heuristic.

jchodera commented 2 years ago

The simplest way forward for now is likely to load old_complex.pdb and new_complex.pdb via MDTraj and use these to build old_hybrid_topology and new_hybrid_topology objects which can then be sliced appropriately. This will avoid the need to build hybrid_topology.pdb, which takes time anyway.

The OpenEye old and new molecules are available via the htf.npz file via

import numpy as np
import mdtraj as md
import openmm # openmm 7.6
from openeye import oechem
htf = np.load('htf.npz', allow_pickle=True)['arr_0'].tolist()
old_oemol = htf._topology_proposal.old_topology.residue_oemol
new_oemol = htf._topology_proposal.new_topology.residue_oemol

but it takes several minutes to extract these, so we will probably want to stick to the heuristic for now.

jchodera commented 2 years ago

I've figured out what is going wrong with the snapshot extraction, but it will take another day to fix. Apologies for the delay!

The main causes are:

  1. There is a bug in core22's <xtcAtoms v="solute"/> feature where atoms can be returned out of order that I neglected to fix
  2. perses does not provide sufficient information about which atoms are omitted in the {new|old}_{ligand|protein}.pdb and hybrid_atom_mappings.npz files it writes.

It's straightforward for me to implement a workaround within fah-xchem, but tedious because both the behavior of the core and perses need to be reproduced within fah-xchem until we can solve one or both of the above issues for the next sprint.

jchodera commented 2 years ago

There's still a lot more left to be fixed, but we at least have one Sprint 11 dashboard running stably now: https://fah-public-data-covid19-moonshot-sprints.s3.us-east-2.amazonaws.com/dashboards/sprint-11/sprint-11-2021-12-26-P1800_0A-dimer-neutral-restrained/index.html

This can be merged and we can carry on from here if it is appropriate to do so. Alternatively, I can keep refining things in this branch.

jchodera commented 2 years ago

@dotsdl : Changes for Sprint 11 are complete! We should get this merged and carry on as appropriate.

Sprint 12 is starting on Monday.

dotsdl commented 2 years ago

Sounds good! Reviewing today. We'll pull this in before #155.

dotsdl commented 2 years ago

I'll fix the tests; all due to website changes.

jchodera commented 2 years ago

@dotsdl : Apologies that I haven't yet made the final changes needed to the data model for experimental data. I had not entirely decided how to handle this, but I believe we need to capture whether the measurement reflects one of several possible states:

stereochemistry = { achiral | racemate | absolute enantiomer | relative enantiomer of {compound_id} }

I hope to be able to complete this on Monday!

dotsdl commented 2 years ago

No worries @jchodera! Happy to discuss tomorrow during our call as well. Just finished improving the tests for checking website elements.