MightyBOBcnc / nixis

A python program for procedurally generating planet-scale maps for Earth-like, spherical worlds.
MIT License
8 stars 0 forks source link

Erosion read/write buffer swapping #7

Closed MightyBOBcnc closed 1 year ago

MightyBOBcnc commented 1 year ago

In erosion.py most of the functions operate with a read buffer and write buffer, and then when a loop is complete the values get moved from one to the other. This should be done with a tuple swap instead of the current method (which I wrote as a hack because I suck).

https://www.30secondsofcode.org/articles/s/python-swap-variables https://stackoverflow.com/questions/14836228/is-there-a-standardized-method-to-swap-two-variables-in-python/14836456#14836456

This obviously doesn't work because of the way python assigns names to objects:

# Switch the read and write buffers
read_buffer = write_buffer
write_buffer = read_buffer

So I wrote this hack where we manually replace everything in the read buffer (heights is acting as the read buffer, then new info was calculated in the temporary write_buffer, and here it's being put back into heights at the end):

# Switch the read and write buffers
for x in prange(len(write_buffer)):
    heights[x] = write_buffer[x]

The proper way (per the links above) would just be:

# Switch the read and write buffers
read_buffer, write_buffer = write_buffer, read_buffer

This issue is for tracking, as the change needs to be applied to several different functions, and also for water and sediment buffers, and each function might be structured slightly differently.

Things to watch out for:

MightyBOBcnc commented 1 year ago

Started work on this. Updated erode_terrain6 to use a tuple swap as it was the most recent (have not touched the other erosion functions yet). To my surprise this offered no performance improvement whatsoever. I tested on the same seed for each test, with -d 2000 (~40 million vertices) and -d 3000 (~90 million vertices), 100 passes per test. (Using big meshes because any performance gains in testing should be more apparent than a mesh with only 1 or 2 million vertices.)

Before implementing the tuple swap the d=2000 mesh took about 85 seconds per test (14 tests). After changing the code to the tuple swap the performance on a d=2000 mesh was the same 85 seconds (6 tests). My expectation was that a tuple swap would be at least a few seconds faster (every little gain helps) as it doesn't need to do heights[x] = write_buffer[x] for every iteration, which is a lot when you're talking about 40 to 90 million verts.

With that being said, I don't know how numpy/numba optimize such a thing so it's possible it was already doing some highly optimal vectorization or something.

One thing of note that changed is that the erode_terrain6 function was @njit decorated before but the decorator was removed to add support for snapshotting each step of erosion because numba does not play nice with the image export functions. Numba does have a fallback 'object' mode @jit(nopython=False) but this is going to be fully removed from numba in 0.59.0. Nevertheless I did try it and it brought down the run time to about 79.8 seconds in conjunction with the tuple swap (6 more tests).


For the d=3000 mesh things went sideways. Before implementing the tuple swap this took about 190 seconds per test (4 tests, but there was an outlier 5th test that took 244, which might have been numba's first-run compile penalty). After implementing the tuple swap it took more like 195 to 205 seconds per test (11 tests) but would seemingly randomly take 620+ seconds (8 tests; NOTE: I only let 2 of these run to completion, the other 6 I aborted early in the terminal, see below).

Watching my CPU I observed that during the 'normal' runs the erosion would consistently have >90% utilization on all cores (this is the expected behavior while erosion is running) but during the >600 second runs the CPU would spike to 100% for each pass of the for loop and then drop to 0% at the end of the pass, and then repeat. This CPU thrashing made each pass take longer, hence why the whole run would take >600 seconds. After this happened twice I started killing the runs with Ctrl+C in the terminal whenever I saw the spike pattern begin.

I also tested the @jit(nopython=False) on the d=3000 mesh for 7 additional tests and they all came in around 188 seconds per test, with no instances of the strange >600 second glitch.

I have no idea what causes it. It can't be CPU thermal throttling because all other areas of Nixis that use the full CPU don't have this problem. Something else running on the system? (Loading up the python interpreter might get different allocations or something.) Length of time since last run? (Things getting unallocated.) Time it takes to complete 1 loop being slightly too long so the next loop has to restart or reallocate something with numba's compiled functions (kernals?)? Python randomly choking on the print statement for each erosion pass?

Dunno. But I'm going to press forward with adding tuple swaps to the other erosion functions for now because it feels 'more correct' as a way to swap the read and write buffers. Some of the other functions also work a little differently (they do their swapping inside the erosion instead of the outer iteration that calls it) so I want to see if they experience the same weirdness or if they get any performance improvement.

MightyBOBcnc commented 1 year ago

erode_terrain5 refuses to be conquered by tuple swapping. Its iteration operates in a different way from erode_terrain6 due to the way it references the input heights. On the plus side, while I've been poking around in erosion.py I found a fix (or a hack) for the spikes forming in erode_terrain5's iteration. See: b3ab247. All tests done on a d=320 mesh, 100 passes each.

With the fix applied we can even run it with Numba's prange without exploding the elevations, although it should be noted that due to the way it is structured this will cause collisions when writing to the write buffer. That causes some minor variation in the output (meaning the result is no longer purely deterministic).

I have some ideas for how to run it in parallel without these collisions but that's a separate task.

MightyBOBcnc commented 1 year ago

erode_terrain1 gets a ~32% speed up from using tuple swap. 100 passes with d=2000 drops from 17.0s to 11.7s, and d=3000 drops from about 37.5s to 25.6s on my machine.

erode_terrain2 gets a ~13%-17% speed up from using tuple swap. 100 passes with d=2000 drops from 48.0s seconds to 41.8s on my machine. With d=3000 it went from 109.2s to 91.7s on my machine.

MightyBOBcnc commented 1 year ago

erode_terrain3 and erode_terrain4 are basically unsalvageable disasters. Considering this issue finished.