Closed jtkrogel closed 1 month ago
NB: I will fix the failing unit test (complex estimators) in 30 min.
Hold on merge for naming discussion for now. Will commit a naming change if the decision is made today.
There is a great deal to cover here. I'm just starting on the review. Please introduce one piece at a time from the most detailed up. I'd start with trace buffer and its unit tests. It needs a review on its own.
Renaming applied. Ready to proceed.
That would be a big help to me as well. I have suggestions/changes needed for the lower level classes which would be better to address and test at that level. I.e. use of lower level hdf5 functions when there is already a way to easily pass containers.
Other issues relate the integration with existing higher level classes which are largely separate.
See #5035
I have merged in develop and fixed cmake issues. Please pull before making further changes.
I've addressed Peter's comment from #5035 requesting simplification of WalkerLogState.
I've also tried Peter's suggestion to use f.writeEntry(buffer, "data")
in WalkerLogBuffer::writeHDF
. Making that replacement resulted in NaN's being written to the HDF files (picked up by integration test). Therefore I've kept the original implementation.
Comments addressed. Ready to proceed.
I've also tried Peter's suggestion to use
f.writeEntry(buffer, "data")
inWalkerLogBuffer::writeHDF
. Making that replacement resulted in NaN's being written to the HDF files (picked up by integration test). Therefore I've kept the original implementation.
You need h5d_append and we don't have a high-level API in hdf_archive right now. added issue #5038. WalkerLogBuffer::appendHDF
can be a better name to avoid confusion.
Test this please
Hi guys. I had thought this was going in given Ye's approval.
If there really is a short list of "must have" items still remaining for this PR after all these passes, please make a short, final list below and I will address them.
I need to get past this point and get back to other work, some of which depends on this code being in place.
Reverting the whitespace OperatorBase.cpp change to reduce the number files is the only must have.
Test this please
Proposed changes
This PR introduces a simplified version of the legacy TraceManager that focuses solely on writing per-walker data to HDF5 throughout the run (no streaming/listener functionality). The new implementation, WalkerLogManager, is compatible with and implemented in both the legacy and batched drivers.
This implementation extends the original to include the ability to write full data for the highest, lowest, and median energy walkers on each rank to disk at every step. The intention of writing this abbreviated data is to enable inexpensive access to data from problematic walkers (e.g. population explosions) in production runs.
Capabilities
Below is an example generated from a short DMC run of diamond on 6 MPI ranks showing the energies of all walkers (blue), highest energy walkers (green), lowest energy walkers (red), median energy walkers (black), vs the mean energy from dmc.dat (magenta line):
Design
Tests
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
Laptop
Checklist
Path out of WIP
Tasks for review round 3
Elements for reviewers to review