Closed MightyBOBcnc closed 1 year ago
Started work on this. Updated erode_terrain6
to use a tuple swap as it was the most recent (have not touched the other erosion functions yet). To my surprise this offered no performance improvement whatsoever. I tested on the same seed for each test, with -d 2000
(~40 million vertices) and -d 3000
(~90 million vertices), 100 passes per test. (Using big meshes because any performance gains in testing should be more apparent than a mesh with only 1 or 2 million vertices.)
Before implementing the tuple swap the d=2000 mesh took about 85 seconds per test (14 tests). After changing the code to the tuple swap the performance on a d=2000 mesh was the same 85 seconds (6 tests). My expectation was that a tuple swap would be at least a few seconds faster (every little gain helps) as it doesn't need to do heights[x] = write_buffer[x]
for every iteration, which is a lot when you're talking about 40 to 90 million verts.
With that being said, I don't know how numpy/numba optimize such a thing so it's possible it was already doing some highly optimal vectorization or something.
One thing of note that changed is that the erode_terrain6
function was @njit
decorated before but the decorator was removed to add support for snapshotting each step of erosion because numba does not play nice with the image export functions. Numba does have a fallback 'object' mode @jit(nopython=False)
but this is going to be fully removed from numba in 0.59.0. Nevertheless I did try it and it brought down the run time to about 79.8 seconds in conjunction with the tuple swap (6 more tests).
For the d=3000 mesh things went sideways. Before implementing the tuple swap this took about 190 seconds per test (4 tests, but there was an outlier 5th test that took 244, which might have been numba's first-run compile penalty). After implementing the tuple swap it took more like 195 to 205 seconds per test (11 tests) but would seemingly randomly take 620+ seconds (8 tests; NOTE: I only let 2 of these run to completion, the other 6 I aborted early in the terminal, see below).
Watching my CPU I observed that during the 'normal' runs the erosion would consistently have >90% utilization on all cores (this is the expected behavior while erosion is running) but during the >600 second runs the CPU would spike to 100% for each pass of the for loop and then drop to 0% at the end of the pass, and then repeat. This CPU thrashing made each pass take longer, hence why the whole run would take >600 seconds. After this happened twice I started killing the runs with Ctrl+C in the terminal whenever I saw the spike pattern begin.
I also tested the @jit(nopython=False)
on the d=3000 mesh for 7 additional tests and they all came in around 188 seconds per test, with no instances of the strange >600 second glitch.
I have no idea what causes it. It can't be CPU thermal throttling because all other areas of Nixis that use the full CPU don't have this problem. Something else running on the system? (Loading up the python interpreter might get different allocations or something.) Length of time since last run? (Things getting unallocated.) Time it takes to complete 1 loop being slightly too long so the next loop has to restart or reallocate something with numba's compiled functions (kernals?)? Python randomly choking on the print statement for each erosion pass?
Dunno. But I'm going to press forward with adding tuple swaps to the other erosion functions for now because it feels 'more correct' as a way to swap the read and write buffers. Some of the other functions also work a little differently (they do their swapping inside the erosion instead of the outer iteration that calls it) so I want to see if they experience the same weirdness or if they get any performance improvement.
erode_terrain5
refuses to be conquered by tuple swapping. Its iteration operates in a different way from erode_terrain6
due to the way it references the input heights. On the plus side, while I've been poking around in erosion.py
I found a fix (or a hack) for the spikes forming in erode_terrain5
's iteration. See: b3ab247. All tests done on a d=320 mesh, 100 passes each.
With the fix applied we can even run it with Numba's prange
without exploding the elevations, although it should be noted that due to the way it is structured this will cause collisions when writing to the write buffer. That causes some minor variation in the output (meaning the result is no longer purely deterministic).
I have some ideas for how to run it in parallel without these collisions but that's a separate task.
erode_terrain1
gets a ~32% speed up from using tuple swap. 100 passes with d=2000 drops from 17.0s to 11.7s, and d=3000 drops from about 37.5s to 25.6s on my machine.
erode_terrain2
gets a ~13%-17% speed up from using tuple swap. 100 passes with d=2000 drops from 48.0s seconds to 41.8s on my machine. With d=3000 it went from 109.2s to 91.7s on my machine.
erode_terrain3
and erode_terrain4
are basically unsalvageable disasters. Considering this issue finished.
In erosion.py most of the functions operate with a read buffer and write buffer, and then when a loop is complete the values get moved from one to the other. This should be done with a tuple swap instead of the current method (which I wrote as a hack because I suck).
https://www.30secondsofcode.org/articles/s/python-swap-variables https://stackoverflow.com/questions/14836228/is-there-a-standardized-method-to-swap-two-variables-in-python/14836456#14836456
This obviously doesn't work because of the way python assigns names to objects:
So I wrote this hack where we manually replace everything in the read buffer (
heights
is acting as the read buffer, then new info was calculated in the temporarywrite_buffer
, and here it's being put back intoheights
at the end):The proper way (per the links above) would just be:
This issue is for tracking, as the change needs to be applied to several different functions, and also for water and sediment buffers, and each function might be structured slightly differently.
Things to watch out for:
id()
and we might want to force iterations to only be an even number to guarantee we always switch back to the original object at the end. (This requires testing, of course, to determine if such a problem even exists.)