Closed jchodera closed 2 years ago
There's a major flaw in the snapshot extraction system that will require a significant rewrite.
In brief, our current scheme does the following:
Trajectory
for the hybrid systemThe problem is that atoms that change identity during the transformation are left in their initial state in the hybrid topology used to create the Trajectory
. We therefore either need to do this a completely different way, or we need some way to construct a "new hybrid topology" to be used before step 4 or to substitute the appropriate new topology.
We also likely need to substitute out the flaky PDB -> SDF conversion heuristic.
The simplest way forward for now is likely to load old_complex.pdb
and new_complex.pdb
via MDTraj and use these to build old_hybrid_topology
and new_hybrid_topology
objects which can then be sliced appropriately. This will avoid the need to build hybrid_topology.pdb
, which takes time anyway.
The OpenEye old and new molecules are available via the htf.npz
file via
import numpy as np
import mdtraj as md
import openmm # openmm 7.6
from openeye import oechem
htf = np.load('htf.npz', allow_pickle=True)['arr_0'].tolist()
old_oemol = htf._topology_proposal.old_topology.residue_oemol
new_oemol = htf._topology_proposal.new_topology.residue_oemol
but it takes several minutes to extract these, so we will probably want to stick to the heuristic for now.
I've figured out what is going wrong with the snapshot extraction, but it will take another day to fix. Apologies for the delay!
The main causes are:
<xtcAtoms v="solute"/>
feature where atoms can be returned out of order that I neglected to fix{new|old}_{ligand|protein}.pdb
and hybrid_atom_mappings.npz
files it writes.It's straightforward for me to implement a workaround within fah-xchem, but tedious because both the behavior of the core and perses need to be reproduced within fah-xchem until we can solve one or both of the above issues for the next sprint.
There's still a lot more left to be fixed, but we at least have one Sprint 11 dashboard running stably now: https://fah-public-data-covid19-moonshot-sprints.s3.us-east-2.amazonaws.com/dashboards/sprint-11/sprint-11-2021-12-26-P1800_0A-dimer-neutral-restrained/index.html
This can be merged and we can carry on from here if it is appropriate to do so. Alternatively, I can keep refining things in this branch.
@dotsdl : Changes for Sprint 11 are complete! We should get this merged and carry on as appropriate.
Sprint 12 is starting on Monday.
Sounds good! Reviewing today. We'll pull this in before #155.
I'll fix the tests; all due to website changes.
@dotsdl : Apologies that I haven't yet made the final changes needed to the data model for experimental data. I had not entirely decided how to handle this, but I believe we need to capture whether the measurement reflects one of several possible states:
stereochemistry = { achiral | racemate | absolute enantiomer | relative enantiomer of {compound_id} }
I hope to be able to complete this on Monday!
No worries @jchodera! Happy to discuss tomorrow during our call as well. Just finished improving the tests for checking website elements.
Description
This is a boatload of patches/fixes/hacks needed to get Sprint 11 running in production on mskcc1.foldingathome.org.
Summary of changes
Useful things to carry forward
--experimental-data-file <data.json>
argument to specify experimental data used to update compound data, along withExperimentalCompoundData
data modelMBAR.computeOverlap()
hybrid_complex.pdb
on the fly if needed, since recent versions ofperses
eliminate these files. This code could be further optimized if needed---see notes in code.Horrible bodges
load_fragment
Todos
Things left to fix:
Status