Closed jchodera closed 1 year ago
Currently the serialized .pdb
files are for the old
system. We could overhaul the serialization as you mention but I think we could have clashes with the solvent when serializing a solvated version of the new
systems (for both complex
and solvent
phases). Should we re-solvate the new
systems? I don't know if that defeats the purpose of serializing these objects.
Maybe we only need the solute versions for the new
systems? And keep both the solvated and solute for the old
ones.
This should be solved via #1210
Currently, perses generates the following topology information on setup:
out-complex.pdb
: a single system (old? new?) in complexout-solvent.pdb
: a single system (old? new?) in solventout-hybrid_factory.npy.npz
: a really slow to deserialize set of way more information than we need to write a trajectoryAdditionally, we have two other issues:
HybridTopologyFactory
contains ahtf._hybrid_topology
object that does not contain all bonds for the new atomsanalysis_particle_indices
used to slice out only non-water atoms to write to the NetCDF file has atom indices out of order because it stacks them as| environment, core, and unique old atoms | unique new atoms | counterions |
whilehybrid_topology
originally has them in the order| environment, core, and unique old atoms | water | counterions | unique new atoms |
I propose we restructure this so we have:
and ensure the atoms in the NetCDF trajectories (checkpoint, standard) are written in the same order as the atoms in the PDB files. Ideally, we could later write replica trajectories as XTC files directly instead of using the NetCDF file, though extracting coordinates doesn't take a huge amount of time.
We can do this for the new
Protocol
version, where we hopefully have a way to package these files in a more sane way.