Closed ohsOllila closed 6 months ago
Hello,
I can help with implementing OpenMM and possibly for Amber data.
I think in the long run it would be very helpful for us to create a separate Python library that contains all the functions we are using, both for building and analyzing the data bank, and rewriting the AddData.py to use these functions. For instance, the analysis available in AddData.py around the lines 800 and afterwards won't be possible with OpenMM and Amber files as there is no tpr file from them.
I was looking at the AddData.py code with this issue in mind and below are my thoughts:
In conclusion, making the AddData.py to work with other programs should not be very complicated.
@batukav do you have OpenMM trajectories in Zenodo which we could start to work with?
@ohsOllila I have one pure POPE trajectory uploaded and ready for further analysis at 10.5281/zenodo.4665773.
2- There's already an implementation in MDAnalysis https://github.com/MDAnalysis/MDAnalysisCookbook/blob/master/examples/membrane-leaflets.py
Maybe this could be an easier for our purposes.
3- I can make the necessary implementation for the OpenMM.
5- To make molecules whole, we can try using MDAnalysis again. Then we can get rid of gmx tools for this point.
2- I do not think that we need modifications in the part which calculates the number of lipids in leaflets. Only thing that needs to be done is to have the structure file in line 845. Where this comes from (creation or input) will be different depending on the program, but this should not be difficult.
5- If you have a good idea how to do it, it would be great.
I think that the first step would be to try to add the data you mentioned.
Only parts which do not seem completely straightforward are to decide which files are compulsory for openMM (i.e. creating the dictionary), and how molecules will be made whole (we can also request people to have molecules in whole in other than Gromacs trajectories if needed).
Do you want to try to modify the code and add the data?
5- I'll work on it.
Yes, in the following days I'll be analyzing the POPE data for the NMRLipids VI project. In the meantime I can try to figure out the compulsory files from the OpenMM data
I created "check_bond.py" that takes the trajectory, checks the bond lengths, and if a bond length is larger than 0.2 nm performs a simple wrapping using the MDAnalysis.
Centering the bilayer without actually visualizing the trajectory is a bit tricky (at least for me). However, for the OP calculation, if I remember correctly, all we need is a wrapped trajectory. We can keep improving the centering options, but as far as I can see it now, the wrapping issue is solved.
Please let me know if there are problems with the script.
Basic analysis codes (order parameters, area per lipid, form factor and thickness) are now implemented for Gromacs and openMM.
For gromacs, the analysis codes tries to use tpr, but if this does not work due to version dependecies, a gro file is created with Gromacs commands and the created gro is used with MDAnalysis.
For openMM, a pdb file is given instead of tpr and pdb is used with MDAnalysis. However, this is not yet updated in the instructions.
For example, see are per lipid calculation in calcALP.py.
I think that similar idea should work also for other programs. However, I am not aware of a code that could be used to check that molecules are whole and that there no inconvenient jumps over pbc in z-direction for other than Gromacs simulations. Therefore, the requirements for other than Gromacs trajectories is more strict currently (this is also not yet in the instructions).
Because this discussion is now overlapping with another issue "Preferred topology field in the input files?" issue, I will close the issue https://github.com/NMRLipids/Databank/issues/10 and we can continue discussion in here.
Have you considered using MDAnalysis transformations with topology provided? At least for me it works fine:
Before:
After:
For unwrap
to work the only import thing is that proper topology with all bonds defined is provided
Thanks, this looks useful. However, I am not sure which file we should use as topology with openMM trajectories? Currently we used pdb, but bonds are probably not sufficiently defined there.
From OpenMM
documentation it seems that OpenMM
should work with one of xml
, psf
, or tpr
formats. All of them contain bond information and are supported by MDAnalysis
. This should be tested of course, but now it seems that asking the user to provide topology in one of these formats can be a solution
I remember trying the MDAnalysis
transformations for wrapping the trajectories but did not get consistent results. But I don't remember about the details of my experiments, unfortunately. On the other hand, it would be highly useful, at some point, to be able to provide these transformations in the analysis scripts and not rely on "good" user input anymore. Yet, I think this is still a bit low priority to implement.
By simply adding a topology
key to the sim
dictionary in AddData.py (and other relevant parts) and requiring the user to fill this topology
key in the yaml file could help us in the long run. This topology key can be entered to be the same as the already existing tpr
and pdb
keys, for instance, yet we just use the topology
key whenever running any analysis that requires the bond information. If no objections agains it, I would suggest adding the topology
key already into the documentation and the manuscript, although we still don't update the "backend" of the databank to use this.
Actually, I now think that we can even overcome providing the topology. If we know what is the lipid name and what the force field is we actually know which atoms are connected (this information can also be stored in the mapping file for instance). Then we can upload use add_TopologyAttr
method (check this) to add bonds and then unwrap
should work normally for lipids.
Wouldn't this approach have problems if there are custom made/non-standard residues and/or different types of lipids (like glycolipids etc.)? Asking for a topology from the user wouldn't be too much to request, I suppose.
Have you considered using MDAnalysis transformations with topology provided? At least for me it works fine:
Before: After:
For
unwrap
to work the only import thing is that proper topology with all bonds defined is provided
For a future public reference; what should be the command for centering the bilayere center of mass at the origin and wrapping the remaining atoms?
Actually, I now think that we can even overcome providing the topology. If we know what is the lipid name and what the force field is we actually know which atoms are connected (this information can also be stored in the mapping file for instance). Then we can upload use add_TopologyAttr method (check this) to add bonds and then unwrap should work normally for lipids.
Wouldn't this approach have problems if there are custom made/non-standard residues and/or different types of lipids (like glycolipids etc.)? Asking for a topology from the user wouldn't be too much to request, I suppose.
Yes, I think that we should definitely rather ask the required topology files from the data contributor rather than try to reproduce the information. Question is that which files do we exactly need, for example, for OpenMM? For example, in the recent data uploaded by @batukav (https://doi.org/10.5281/zenodo.7415922, https://doi.org/10.5281/zenodo.7418619), there are some xml files but not psf files. Are these xml files enough? Are both of those needed?
If making molecules whole in a reasonable and robust way in these simulations is not feasible, we can also accept only OpenMM simulations with molecules already whole (which is the case in all current OpenMM simulations in the databank).
For other analyses I have done from the databank (standard + the ones in the manuscript), pdb for OpenMM has been sufficient assuming that the molecules are whole. This would probably apply also to other programs, such as Amber from which there is also data in Zenodo. For Amber we have the same questions: Do we ask some topology file, and if yes, which one? Or do we just ask pdb and require that molecules are whole?
Yes, I think that we should definitely rather ask the required topology files from the data contributor rather than try to reproduce the information. Question is that which files do we exactly need, for example, for OpenMM? For example, in the recent data uploaded by @batukav (https://doi.org/10.5281/zenodo.7415922, https://doi.org/10.5281/zenodo.7418619), there are some xml files but not psf files. Are these xml files enough? Are both of those needed?
With OpenMM one can generate the topology information within the Python scripts that are used to run the simulation. Therefore sometimes one does not need to create a separate topology file and read it into the simulation. This was the approach taken by the AMOEBA script I have used. The xml files are the force field files. It should be indeed possible for us to parse the force field file to get the bond information and use this to check from the pdb file + trajectory if the molecules are whole. I don't know how fast we can come up with an implementation. Also, this is a rather edge case that should not be a high priority, in my opinion. That being said, I will check if I can somehow save the topology information from OpenMM so that I can update the Zenodo files.
Current order parameter script calculates the bond lengths and raises a warning if this is beyond a threshold. We can urge the users to upload a topology file and this should not be a problem in a large majority of the cases. For the uploads without a topology information, we can raise a warning in the scripts saying that we cannot make the molecules whole therefore this and that functionality of the databank won't work.
For other analyses I have done from the databank (standard + the ones in the manuscript), pdb for OpenMM has been sufficient assuming that the molecules are whole. This would probably apply also to other programs, such as Amber from which there is also data in Zenodo. For Amber we have the same questions: Do we ask some topology file, and if yes, which one? Or do we just ask pdb and require that molecules are whole?
Simulations from the AMBER MD engine requires prmtop
files to run. These files can be read into MDAnalysis
and subsequently be used. So, we should ask for the prmtop
files, otherwise raise a warning.
If making molecules whole in a reasonable and robust way in these simulations is not feasible, we can also accept only
OpenMM simulations with molecules already whole (which is the case in all current OpenMM simulations in the databank). Except the POPC with 1000 mM CaCl2 (10.5281/zenodo.5507316), all OpenMM trajectories from the Drude simulations have the molecules whole. I'll fix the issues with this trajectory and re-upload it.
I have now enabled addition of simulations ran with NAMD program in exactly the same way as openMM simulations. This seems to work. This might be the way to go also for data with other programs. We can try this when the first contributions with other programs will come.
I will close this. If there will be data from new programs, we can open new issues specifically for each program.
AddData.py already contains the dictionaries also for other than Gromacs simulations. However, the code probably does not fully work with those. This should be fixed at least for OpenMM and Amber data that we already have in Zenodo.