Separating out objects from Engens for better resusabilty

FelixMQuintana commented 1 year ago

My branch provides examples of how to expand out your existing Engens class into more independent objects for better reusability. Work is not complete, just something to work off of.

AnjaConev commented 1 year ago

Really nice, this will be a great enhancement.

I can took a look at you commits and I can also start adding stuff and implementations.

Here is the UML diagram I made:

And a link for you to check it out and edit if you'd like: https://lucid.app/lucidchart/aae34ca7-6dc9-4fb6-988b-0524ddf26756/edit?viewport_loc=-171%2C-67%2C889%2C1267%2C0_0&invitationId=inv_e19dfa8b-6bb7-4ce9-9268-dc31fab7783e#

I will think about what are some of the steps I can take on and I will post in next comment

AnjaConev commented 1 year ago

Ok for now we are cleaning up the file handling. This will be useful as all the file handling stuff right now is spread out through EnGens class and it is very messy.

Two key things that are a part of the file handling:

Loading
Aligning

Loading means taking the path and instantiating PyEmma trajectory (with pyemma.coordinates.source) or the PDB files (with mdtraj.load). This stuff for PDBs: https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L78-L87 This stuff for trajectories is a bit spread out through the code and looks like this: https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L215

Things to think about:

Should we add an abstract method load() to SimulationFile? This method can perform the loading from paths to the actual structures?
Another tricky thing: there is an option of "residue selection" while loading a file. This means - user gives a list of atoms as a substructure that he wants to load from the input path. In this case - when we load the files we load the selection and not the full structure. Maybe we can make an abstract FileLoader and then have a different function when loading a selection vs loading the full file?
When loading trajectories we actually need to give two paths: one to the trajectory file (".xtc" extensions and such) and one to the topology file (".pdb" extension) - we might have to modify TrajectoryFile to have another path (or to contain PDBFile with the path to the topology)
Trajectories can be very long and during loading we need to take care of the memory. So we have to make sure that we only load them with pyemma.coordinates.source (this is a memory safe call unlike pyemma.coordinates.load).

Aligning means aligning the structures within the trajectory or aligning the structures within the list of PDB files.

These functions: https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L149 https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L263

Things to think about:

In our use-cases pdb files are usually dealt with as a list of PDBFiles (e.g. alignment can actually be performed only on a list of pdb files and not for one file). Maybe we can have another object PDBFilesList that is Alignable instead of PDBFile being Alignable.

AnjaConev commented 1 year ago

I think action items can be as follows:

Figure out how to implement the file loading
- [ ] Pass both topology and trajectory file in order to load TrajectoryFile
- [ ] Load regular files
- [ ] Load files with given selection

always keep in mind the possible memory problems

Figure out how to implement alignment
- [ ] Alignment of trajectories (should be straightforward copy codes etc.)
- [ ] Alignment of pdb files (implement a list of PDBFiles that is Alignable)

KavrakiLab / EnGens

Separating out objects from Engens for better resusabilty #1