Closed braceal closed 3 years ago
The pdb file looks strange that all hydrogen atoms are piled at the end of each residue. Does mdanalysis v1.0.0 pdb writer do the same?
Here is one written by v1.0.0: system_stage0001_task0000.txt
It looks like the v1.0.0 has the right placement of H atoms.
Also, John-Luke tested mdanalysis v1.0.0 and it seems to produce the same issue. We are trying openmm v7.4.0 now instead of v7.5.1 since I've tested v7.4.0 and know that it works. We are both using the same version of parmed v3.0.
You can also try outputting pdb this way.
u = MDAnalysis.Universe(
str(input_pdb_file), str(traj_file), in_memory=in_memory
)
u.trajectory[frame]
# PDB = MDAnalysis.Writer(str(output_pdb_file))
# PDB.write(u.atoms)
u.atoms.write(str(output_pdb_file))
That would be a good update. Though, I feel like it shouldn't cause the issue. @jlnav can you give it a try?
Thank you very much for your help! But I'm sorry to report that I'm still getting the same error with mdanalysis v1.0.0, openmm v7.4.0, and the suggested change above to data/api.py
. I can also still reproduce the issue with test_pdb.py
above.
My new .pdb
file, just in case:
task0004__ensemble_2021-07-28_sim17_worker3.pdb.txt
Please let me know if there's anything else I can try, or anything else I can do to help
Hi,
Just stumbling on this issue and wanted to weigh in from the MDA side of things. I'm not aware of anything introduced in PDBWriter that would cause this type of behaviour (at least not intentionally). If you have a test input pdb/traj file that reproduce the issue when passed through https://github.com/DeepDriveMD/DeepDriveMD-pipeline/blob/69fcd7292b93cf22b1bd31d2817b0be1caa84e3e/deepdrivemd/data/api.py#L371 we'd be more than happy to look at it from our end.
Note: the only thing that would come to mind would be https://github.com/MDAnalysis/mdanalysis/issues/3364 but that that function doesn't seem to be calling anything that would trigger MDAnalysis.AtomGroup.unique
.
Hi @IAlibay, thanks for your input! It seems that @jlnav has tested different version of mdanalysis and openmm, which leads me to believe it's not an issue with the packages, but rather an issue with usage -- further, this logic currently works correctly in the main workflow we have developed. @jlnav can you please post a link to the parts of your source code where these files are written? Maybe there is something slightly off that we can fix. Also, can you post the json file output from the Agent which contains the paths and frame of the DCD file?
@jlnav if the DCD trajectory isn't too large, can you zip it up with the input PDB file and the generated PDB and attach it here so @IAlibay can try to reproduce the error? Thanks in advance!
Hello all,
Thank you for the continuing help! Since my use-case involves using libEnsemble as a workflow manager to launch parts of DeepDriveMD (including run_openmm.py
), I don't think I've written any code that interferes with how these files are written (although I'm prepared to be wrong!).
Some possibly relevant information:
run_openmm.py
is being launched via (simplified for length):
mpirun -hosts gpu3 -np 1 --ppn 1 python .../DeepDriveMD-pipeline/deepdrivemd/sim/openmm/run_openmm.py -c .../ensemble_2021-07-28_14:12:52/sim17_worker3/molecular_dynamics.yaml
Failed task's molecular_dynamics.yaml
:
molecular_dynamics.yaml.txt
(The pdb_file
field is intentionally null, since this is a subsequent MD task following the agent, correct?)
Agent's .json
:
stage0000_task0000.json.txt
Reference .pdb
file is found here: https://raw.githubusercontent.com/DeepDriveMD/DeepDriveMD-pipeline/main/data/bba/1FME-folded.pdb
Here's the generated .pdb
again:
task0004__ensemble_2021-07-28_sim17_worker3.pdb.txt
Perhaps notably, the failing task did not produce a .dcd
file, although all the other stage0001 MD tasks that succeeded after the first agent run did.
Please let me know if there's anything else you need!
@jlnav apologies if I'm misunderstanding the order of the files (or possibly more critically the issue being discussed). The "reference" pdb
file has the same atom ordering as the "generated" pdb
file. My understanding from https://github.com/DeepDriveMD/DeepDriveMD-pipeline/issues/41#issuecomment-888482329 was the input order was different from the output (and hence the cause of the issue). Is re-ordering happening somewhere else along the pipeline prior to the MDAnalysis call?
Hi @IAlibay , I apologize for the confusion, especially since I can't speak to the expected contents or ordering of the .pdb
files. I'm just experimenting with workflows using DeepDriveMD as a use-case. If you're saying that the two .pdb
files shouldn't have the same atom ordering...
@braceal , is my molecular_dynamics.yaml
using the correct .pdb
file in the reference_pdb_file
field? Should it be the "generated" file (not 1FME-folded.pdb
) instead?
@jlnav sorry for the delay in my response. You have it correct, the reference_pdb_file
should always be set to 1FME-folded.pdb
. And yes, the pdb_file for the second and subsequent iterations should be null, since the later MD iterations are responsible for writing their own PDB file (that is the part that is failing though). If this failure occurs, then the DCD file would not be produced, since the simulation would not have started yet.
Is this the only PDB file creation that is failing in the second iteration? Do the rest of the stage0001/task* runs finish successfully?
@IAlibay there should not be any reordering from the pipeline. Essentially what the pipeline does (in a simple case with one simulation) is output a DCD file, then pick points inside the DCD file to restart a new simulation. So there is no reordering. Once the pipeline has selected which point to restart from, it calls this function to output a new PDB file to initialize a new OpenMM simulation with. My understanding is that, the atom ordering should be the same as the input_pdb_file
. The input_pdb_file
starts off the same for all simulations and then becomes the newly written PDB files from the above function. Since the initial PDBs are the same, and they are used to generate the rest, then the atom ordering should be preserved.
@braceal I apologize myself for my delayed response. I may be slow to respond over the next two weeks.
Over the second iteration, several of the other MD tasks succeed. Looking at the output again, it appears the failing task has parsed out a different "system name" than the other tasks for it's .pdb
, e.g.:
(these are each stage0001. sim17 is the failing task)
$ ls sim* | grep __
...
sim15_worker3:
...
sim2_worker2__ensemble_2021-07-28_14:12:52_sim15_worker3.pdb
sim16_worker2:
...
sim0_worker2__ensemble_2021-07-28_14:12:52_sim16_worker2.pdb
sim17_worker3:
...
task0004__ensemble_2021-07-28_14:12:52_sim17_worker3.pdb
Based on my understanding of DeepDriveMD_API.get_system_name()
and the MD/Agent's output it appears the failing task attempts to use 1FME-folded.pdb
as a structure file, whereas the other "succeeding" tasks do not. Could this produce my problem?
I'll keep on experimenting in the meantime. Thank you again!
Hey @jlnav, No worries, we are both busy :)
The reference PDB file is a big clue. It may be that the reference PDB file is in the wrong place. Can you post a link to your source code here so I can take a look?
The reference PDB should be a static path, it is always the same and there is no need to copy/move it unless you are staging it to a node local storage device.
@braceal ,
As requested, my source code can be found here: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/ddmd/run_libe_ddmd.py
For some explanation, that I believe also may have helped me out:
1FME-unfolded.pdb
and 1FME-folded.pdb
are downloaded to the starting directory.run_openmm.py
by libEnsemble occurs in separate, task-specific "sim directories".1FME-unfolded.pdb
and 1FME-folded.pdb
are created insideLooks like for now I probably don't need to create a symlink to 1FME-folded.pdb
at all, and should just populate reference_pdb_file
with a path to the initial copy. Let's give this a try...
@jlnav thanks for sending the code. Yes, I believe this is indeed the issue. It may be isolated to this portion of the code: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/ddmd/openmm_md_simf.py#L21
The reference_pdb_file
can point to any path, it need not be in the initial_pdb_dir
. In this case, I believe DDMD may glob the reference PDB from inside the task run directory and try to use it to generate the next set of restart PDBs. This particular reference PDB may have some inconsistency in it which causes the error. I'd have to trace through the code to know for sure, but see if just placing the reference PDB outside the initial_pdb_dir
fixes the issue. There is no need to copy the reference PDB to each workers directory, it is just a static file.
@braceal ,
I'm both happy and disappointed to report that the forcefield creation issue that produced the initial ValueError
was seemingly resolved by fixing the reference_pdb_file
's path. My bad! This issue can probably be closed.
Thank you very much!
Glad to know it is working again. I'll close the issue, thanks!
@IAlibay, it was not an issue with MDAnalysis.
Describe the bug
PDB files written with this function using mdanlaysis v1.1.1 produce an error in openmm forcefield creation here. This error does not occur with mdanalysis v1.0.0. The error:
Note This is a BBA PDB file. parmed version is 3.0 The offending PDB is attached as a txt file: task0005__ensemble_2021-07-13_15_13_26_sim20_worker3.txt
To Reproduce On Lassen:
test_pdb.py
looks like:Expected behavior The function should return a openmm simulation object like it does for mdanalysis v1.0.0