jchodera commented 2 years ago

Description

This is a boatload of patches/fixes/hacks needed to get Sprint 11 running in production on mskcc1.foldingathome.org.

Summary of changes

Useful things to carry forward

Added support for optional --experimental-data-file <data.json> argument to specify experimental data used to update compound data, along with ExperimentalCompoundData data model
Removed DEBUG code that set stddev of works to 5 kT arbitrarily
Updated overlap matrix code for pymbar 3.0.3, but will need tweaks to make compatible with 3.0.3 and 3.0.5 APIs for MBAR.computeOverlap()
Added tracebacks in some locations to facilitate debugging
Regenerate hybrid_complex.pdb on the fly if needed, since recent versions of perses eliminate these files. This code could be further optimized if needed---see notes in code.

Horrible bodges

Hacked PDB file path in load_fragment

Todos

Things left to fix:

[x] Mpro is being imaged into different boxes (likely an issue with simulation setup, not fah-xchem)
[x] Compound pIC50s on front page appear to be so much more potent than expected
[x] Add experimental 95% CIs to plots and reported statistics
[x] Add bootstrapped convergence vs number of samples plots to Transformations page
[x] Sort out stereochemistry nightmares
[x] Improve handling of racemic compounds

Status

[x] Ready to go

codecov-commenter commented 2 years ago

Codecov Report

Merging #164 (b91c8a6) into master (9930c06) will decrease coverage by 3.41%. The diff coverage is 10.46%.

jchodera commented 2 years ago

There's a major flaw in the snapshot extraction system that will require a significant rewrite.

In brief, our current scheme does the following:

construct an MDTraj Trajectory for the hybrid system
aligns it to the appropriate reference frame
slices out the appropriate snapshot frame
slices out atom subsets for {old|new} {complex|protein|ligand}
writes these out in PDB and SDF formats (using a flaky PDB -> SDF conversion heuristic)

The problem is that atoms that change identity during the transformation are left in their initial state in the hybrid topology used to create the Trajectory. We therefore either need to do this a completely different way, or we need some way to construct a "new hybrid topology" to be used before step 4 or to substitute the appropriate new topology.

We also likely need to substitute out the flaky PDB -> SDF conversion heuristic.

jchodera commented 2 years ago

The simplest way forward for now is likely to load old_complex.pdb and new_complex.pdb via MDTraj and use these to build old_hybrid_topology and new_hybrid_topology objects which can then be sliced appropriately. This will avoid the need to build hybrid_topology.pdb, which takes time anyway.

The OpenEye old and new molecules are available via the htf.npz file via

import numpy as np
import mdtraj as md
import openmm # openmm 7.6
from openeye import oechem
htf = np.load('htf.npz', allow_pickle=True)['arr_0'].tolist()
old_oemol = htf._topology_proposal.old_topology.residue_oemol
new_oemol = htf._topology_proposal.new_topology.residue_oemol

but it takes several minutes to extract these, so we will probably want to stick to the heuristic for now.

jchodera commented 2 years ago

I've figured out what is going wrong with the snapshot extraction, but it will take another day to fix. Apologies for the delay!

The main causes are:

There is a bug in core22's <xtcAtoms v="solute"/> feature where atoms can be returned out of order that I neglected to fix
perses does not provide sufficient information about which atoms are omitted in the {new|old}_{ligand|protein}.pdb and hybrid_atom_mappings.npz files it writes.

It's straightforward for me to implement a workaround within fah-xchem, but tedious because both the behavior of the core and perses need to be reproduced within fah-xchem until we can solve one or both of the above issues for the next sprint.

jchodera commented 2 years ago

There's still a lot more left to be fixed, but we at least have one Sprint 11 dashboard running stably now: https://fah-public-data-covid19-moonshot-sprints.s3.us-east-2.amazonaws.com/dashboards/sprint-11/sprint-11-2021-12-26-P1800_0A-dimer-neutral-restrained/index.html

This can be merged and we can carry on from here if it is appropriate to do so. Alternatively, I can keep refining things in this branch.

jchodera commented 2 years ago

@dotsdl : Changes for Sprint 11 are complete! We should get this merged and carry on as appropriate.

Sprint 12 is starting on Monday.

dotsdl commented 2 years ago

Sounds good! Reviewing today. We'll pull this in before #155.

dotsdl commented 2 years ago

I'll fix the tests; all due to website changes.

jchodera commented 2 years ago

@dotsdl : Apologies that I haven't yet made the final changes needed to the data model for experimental data. I had not entirely decided how to handle this, but I believe we need to capture whether the measurement reflects one of several possible states:

stereochemistry = { achiral | racemate | absolute enantiomer | relative enantiomer of {compound_id} }

I hope to be able to complete this on Monday!

dotsdl commented 2 years ago

No worries @jchodera! Happy to discuss tomorrow during our call as well. Just finished improving the tests for checking website elements.

choderalab / fah-xchem

Boatload of patches required for Sprint 11 #164

Description

Summary of changes

Useful things to carry forward

Horrible bodges

Todos

Status

Codecov Report