Closed xiki-tempula closed 1 year ago
It looks like it's simply not updating the record dictionary before returning. Should be an easy fix. Will sort this tomorrow.
This is actually working for me. Can you check to see if your process returns an error? I assume that you are running this vacuum simullation with PMEMD, in which case the default config options in your sandpit might not be valid for vacuum. (I've mentioned this previously on Slack.)
I have tried using the native BSS and sander as well. So
import BioSimSpace as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()
protocol = BSS.Protocol.Production(report_interval=1)
process = BSS.Process.Amber(mol.toSystem(), protocol)
process.start()
process.wait()
records = process.getRecords()
len(records['RESTRAINT'])
This gives me 2501 results If I change the report_interval to 10, I would imagine that I get 10 times less samples.
import BioSimSpace as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()
protocol = BSS.Protocol.Production(report_interval=10)
process = BSS.Process.Amber(mol.toSystem(), protocol)
process.start()
process.wait()
records = process.getRecords()
len(records['RESTRAINT'])
But I got 2499 records in this case
I tried again and
import BioSimSpace as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()
protocol = BSS.Protocol.Production(report_interval=10)
process = BSS.Process.Amber(mol.toSystem(), protocol)
process.start()
process.wait()
records = process.getRecords()
len(records['RESTRAINT'])
Gives me 2501 results this time.
The watchdog used to handle the automatic file reading can be a little unreliable when the frequency of writes is very large. (It is monitoring changes to a file and updating the dictionary automatically). However, changing restart interval should change the number of records that you get. I'll investigate this.
Should this be report_interval in this case? For Gromacs, I pinning the frequency of getting the energy by report_interval.
This is because I've incorrectly updated the frequency of writing energies when adapting the configuration generator from your sandpit. Previously this was pinned to the report_interval
for anything other than a Minimisation protocol. Now it is set to 200 regardless.
Will fix this now.
This is the case in your sandpit too.
But this still won't explain that the length of record that I got is non-deterministic? If I run the same command a number of times, I got different result each time.
This is the case in your sandpit too.
If you look at my example, if I change to import BioSimSpace as BSS
, the problem still persists?
The watchdog used to handle the automatic file reading can be a little unreliable when the frequency of writes is very large.
I have changed the line to protocol = BSS.Protocol.Production(report_interval=2000)
, in this case, I shall get 250 samples
protocol.getRunTime()/protocol.getTimeStep()/protocol.getReportInterval()
250.0
but I still got 2500 samples? I'm using import BioSimSpace as BSS
.
This is because I haven't fixed it yet. I'm doing that now :man_shrugging:
This is because I've incorrectly updated the frequency of writing energies when adapting the configuration generator from your sandpit. Previously this was pinned to the report_interval for anything other than a Minimisation protocol. Now it is set to 200 regardless. Will fix this now.
With my fix I see:
protocol.getRunTime()/protocol.getTimeStep()/protocol.getReportInterval()
250.0
records = process.getRecords()
len(records["ETOT"])
251
Ok, I thought the problem with report_interval
is only in the sandpit.
This is now merged. I've not updated the sandpit, since there might be a reason why you update at a fixed interval. Feel free to fix this in your own fork if needed.
Great thanks.
Is using watchdog to handle this function too risky? I think would it be better to just parse the amber output file? I don't know how reproducible this is but I can get different number of values for different keys.
import BioSimSpace.Sandpit.Exscientia as BSS
mol = BSS.Parameters.gaff('C').getMolecule()
protocol = BSS.Protocol.Production()
process = BSS.Process.Amber(mol.toSystem(), protocol)
process.start()
process.wait()
data = process.getRecords()
In [20]: [(key, len(value)) for key, value in data.items()]
Out[20]:
[('NSTEP', 533),
('TIME(PS)', 533),
('TEMP(K)', 533),
('PRESS', 533),
('ETOT', 531),
('EKTOT', 531),
('EPTOT', 531),
('BOND', 531),
('ANGLE', 531),
('DIHED', 531),
('1-4 NB', 531),
('1-4 EEL', 531),
('VDWAALS', 531),
('EELEC', 531),
('EHBOND', 531),
('RESTRAINT', 531)]
Note NSTEP doesn't have the same length as the others.
Yes, I think it should be possible to get the same information from the log file. (I think the same records are always present.) The flakiness of the background watchdog
has been a bit variable over time and, since most people have used this functionality interactively, I've not been too concerned. When originally writing the code (a long time ago) I experimented with different approaches for parsing the output files to see what was possible. The nice thing with this approach is that it's fully automated, although relies on correct file event signals from the OS, which can be problematic when the file is being updated rapidly.
Internally, things that use this output, e.g. plotting tools, just truncate to the shortest record, i.e. if you plot a time series of some data it and there is a mismatch between x and y, it will just use the shortest, like zip
does. Another option could be to just truncate the records in a similar way within getRecords
, but it would probably be best to ensure that they are all present. (Things will still be missing / mismatched if a user calls getRecords
interactively, though, since not all records may have been written to the file for the current time step.)
I could just try using the pygtail
approach used in some of the other process classes, i.e. that only reads new records written to the log file.
Thanks.
I could just try using the pygtail approach used in some of the other process classes, i.e. that only reads new records written to the log file.
I think this would be the best.
I still don't understand why there could be a mismatch? In this case, I use it after wait
to ensure that everything finishes?
I still don't understand why there could be a mismatch? In this case, I use it after wait to ensure that everything finishes?
Because the dictionary is being updated automatically in the background by watchdog while the simulation is running. This uses system file event triggers to know when to parse the mdinfo
file and update the dictionary. The issue is that these triggers can be inconsistent on different operating systems (Windows in particular) so sometimes it might read twice, or not at all. At present it is set to trigger on any modification, not just a close-write. This was needed, since close-write isn't supported on Windows. I'll try making this conditional to Windows to see if it's more reliable. Here's the code.
For reference, I've run your example 10 times in a row and get the exact same number of outputs for all records each time, so it's clearly doing the correct job on Linux for me. I'll tell you what to try modifying when I've tested.
It looks like watchdog
now supports IN_CLOSE_WRITE
. (It didn't use to, since it wasn't available on Windows.) As another workaround, I can just see if NSTEP
has changed since the last read and skip if it hasn't.
So I'm testing it on a OSX system. It is a bit annoying that you cannot reproduce it.
I can reproduce on an old macBook. Will try to figure out why some results from the header line are duplicated. We already check for repeated NSTEPS
entries. Also, IN_CLOSE_WRITE
is still not supported. (The PR thread was misleading.) This won't help anyway, since AMBER opens and closes the file once only, flushing in between for new entries.
I also get a different number of entries on macOS, irrespective of the different in the total numbers for NSTEP
, TIME(PS)
, and TEMP(K)
. For the others, there are 1051 records (different to yours, so presumably you are using your internal sandpit) which is less than the 2501 on Linux.
I've checked the values for NSTEP
and they are unique, but on macOS the spacing between successive records is inconsistent, so it is missing data. I'm not yet sure why this is. Will see if it's possible to fix, otherwise it probably makes sense to switch to parsing the log file instead. (The watchdog
repo does have lots of cross-platform checks, so I'm surprised that a simple event trigger is failing like this.)
@lohedges Thanks. It will greatly help me to be able to read from the amber log. Then I could run the simulation on one aws instance and analysis the data on the next aws instance.
This is now implemented here. It should be merged later/tomorrow, or feel free to checkout the branch locally and test yourself.
Cheers.
This is really fast! Thank you. Will test it once it is merged.
I've just found that the record dictionary is wrong when position restraints are active, since sander
(not sure about pmemd
) injects an EAMBER (non-restraint)
record, but not for each step that is logged. This is just the potential energy minus the restraint energy, which can easily be reconstructed from the other information.
For now I'll just ignore this record. It is fixed on my local feature branch.
Describe the bug So when I run a simulation with
BSS.Protocol.Production(report_interval=1)
, I think I shall get at leastruntime/timestep/report_interval
number of records in withprocess.getRecords()
.To Reproduce
Gives
{}
Expected behavior I shall get a dictionary in the format ofWhere there is
runtime/timestep/report_interval
number of records for each entry.(please complete the following information):
Additional context Add any other context about the problem here.