Closed NikoOinonen closed 5 months ago
Attention: Patch coverage is 64.83051%
with 83 lines
in your changes are missing coverage. Please review.
Project coverage is 50.59%. Comparing base (
9492233
) to head (6a11956
). Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
ppafm/ml/Generator.py | 67.66% | 43 Missing :warning: |
ppafm/ocl/field.py | 37.83% | 23 Missing :warning: |
ppafm/ocl/AFMulator.py | 52.77% | 17 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ProkopHapala
When you talk about "performance upgrades" did you measured the improvement? Is it noticeable?
It is quite noticeable. I tested on a bunch of small molecules and got something like 20~40% reduction in force-field generation time.
I'll mention this here as well. I was measuring the timings for the generation process, and found that on the first simulation for each sample, the time to copy the potential and density to the device takes almost as much time as the whole force-field calculation (not an issue if doing multiple rotations for the same molecule). In order to counter this, I was trying to make a system that would load one sample ahead to the device memory asynchronously, so that it would be already there when the next simulation starts.
However, this turned out to be problematic because the enqueue operation in OpenCL is somehow not guaranteed to be fully asynchronous. Specifically, on Nvidia hardware the enqueue seems to take almost as long as the copy operation itself, even when explicitly set as non-blocking. However, on AMD the enqueue seems to return immediately.
Update to the machine learning batch generator.