Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
254 stars 58 forks source link

Provide simlist with single .psf file for all trajectories #1083

Closed AdriaPerezCulubret closed 6 months ago

AdriaPerezCulubret commented 6 months ago

Currently, if I try to provide only a single .psf file to load all trajectories in a simlist, later it will cause errors in metric projection, which does not happen if I provide a .pdb (but then bond information might be missing).

A solution for this is to provide the input folders, but doesn't work when used with filtered trajectories from simfilter, since only a single .psf and .pdb file are created. Passing both of them as a list doesn't work, as the function expects a directory/molfile for each trajectory. I could manually create them with the filtered molfiles, but would be much better if I could just provide a single .psf file.

Error reproduction: test_simlist_issue.tar.gz

In [1]: from htmd.ui import *
In [2]: sims = simlist(glob('e1s*'), 'filtered.psf')
Creating simlist: 100%|████████████████████████████████████████| 3/3 [00:00<00:00, 614.19it/s]
In [3]: metr = Metric(sims)
In [4]: metr.set(MetricDistance('resname MOL', 'protein and name CA', periodic='selections'))
In [5]: data = metr.project()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 data = metr.project()

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/htmd/projections/metric.py:159, in Metric.project(self, njobs)
    157     for proj in self.projectionlist:
    158         if isinstance(proj, Projection):
--> 159             proj._setCache(uqMol)
    160 else:
    161     logger.warning(
    162         "Cannot calculate description of dimensions due to different topology files for each trajectory."
    163     )

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/projections/projection.py:31, in Projection._setCache(self, mol)
     30 def _setCache(self, mol):
---> 31     resdict = self._calculateMolProp(mol)
     32     self._cache.update(resdict)

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/projections/metricdistance.py:181, in MetricDistance._calculateMolProp(self, mol, props)
    179 res = {}
    180 if "sel1" in props:
--> 181     res["sel1"] = self._processSelection(mol, self.sel1, self.groupsel1)
    182 if "sel2" in props:
    183     res["sel2"] = self._processSelection(mol, self.sel2, self.groupsel2)

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/projections/metricdistance.py:194, in MetricDistance._processSelection(self, mol, sel, groupsel)
    188 if isinstance(sel, str) or (
    189     isinstance(sel, np.ndarray)
    190     and sel.ndim == 1
    191     and (np.issubdtype(sel.dtype, np.integer) or sel.dtype == bool)
    192 ):
    193     if groupsel is None:
--> 194         sel = mol.atomselect(sel)
    195     elif groupsel == "all":
    196         sel = self._processMultiSelections(mol, [sel])

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/molecule.py:951, in Molecule.atomselect(self, sel, indexes, strict, fileBonds, guessBonds, _debug)
    946     s = np.ones(self.numAtoms, dtype=bool)
    947 elif isinstance(sel, str):
    948     s = atomselect(
    949         self,
    950         sel,
--> 951         bonds=self._getBonds(fileBonds, guessBonds),
    952         _return_ast=_debug,
    953         _debug=_debug,
    954     )
    955     if _debug:
    956         s, ast = s

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/molecule.py:904, in Molecule._getBonds(self, fileBonds, guessBonds)
    902     bonds = np.vstack((bonds, self.bonds))
    903 if guessBonds:
--> 904     bonds = np.vstack((bonds, self._guessBonds()))
    905 return bonds.astype(np.uint32)

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/molecule.py:1133, in Molecule._guessBonds(self, rdkit)
   1131     return guess_bonds_rdkit(self)
   1132 else:
-> 1133     return guess_bonds(self)

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/bondguesser.py:130, in guess_bonds(mol)
    127 def guess_bonds(mol):
    128     defaults = {"H": 1.0, "C": 1.5, "N": 1.4, "O": 1.3, "F": 1.2, "S": 1.9}
--> 130     coords = mol.coords[:, :, mol.frame].copy()
    131     radii = []
    132     for i, el in enumerate(mol.element):

File /scratch/users/adria/software/mambaforge/envs/htmd/lib/python3.10/site-packages/moleculekit/molecule.py:464, in Molecule.frame(self)
    462 """The currently active frame of the Molecule on which methods will be applied"""
    463 if self._frame < 0 or self._frame >= self.numFrames:
--> 464     raise RuntimeError("frame out of range")
    465 return self._frame

RuntimeError: frame out of range
AdriaPerezCulubret commented 6 months ago

This can be fixed by providing a single directory with topology files on it.

In [1]: from htmd.ui import *
In [2]: sims = simlist(glob('e1s*'), 'folder_with_topology_files/')