This PR tries to grab some of the low hanging fruit for ways to improve CPU performance for larger render distances. Based on profiling I've done on my end, these are likely the easiest performance improvements we can make, and larger performance improvements would likely require more work.
This PR has two changes to make metrics easier to use and two changes that improve performance:
Metrics:
Metrics are sorted before being displayed.
Profiling doesn't begin until the server has sent at least one node to the client.
Performance:
The results of traversal::nearby_nodes are reused so that they only have to run once per frame
The logic to start generating new chunks is skipped if the work queue is full.
As for what this did, here are some very rough metrics for a render distance of 120:
Before:
2024-05-03T01:39:12.739839Z INFO metric key=frame.cpu percentile_25=206.438399ms percentile_50=211.550207ms percentile_75=219.545599ms max=231.604223ms
2024-05-03T01:39:12.740008Z INFO metric key=frame.cpu.voxels.draw percentile_25=14.007µs percentile_50=18.911µs percentile_75=21.903µs max=26.303µs
2024-05-03T01:39:12.740183Z INFO metric key=frame.cpu.voxels.graph_traversal percentile_25=45.449215ms percentile_50=46.465023ms percentile_75=49.905663ms max=55.410687ms
2024-05-03T01:39:12.740343Z INFO metric key=frame.cpu.voxels.node_scan percentile_25=109.248511ms percentile_50=111.083519ms percentile_75=118.685695ms max=123.076607ms
2024-05-03T01:39:12.740492Z INFO metric key=frame.gpu.after_draw percentile_25=74.111µs percentile_50=90.623µs percentile_75=137.215µs max=210.559µs
2024-05-03T01:39:12.740628Z INFO metric key=frame.gpu.draw percentile_25=49.663µs percentile_50=79.999µs percentile_75=86.015µs max=116.287µs
After:
2024-05-03T01:51:08.919659Z INFO metric key=frame.cpu percentile_25=59.998207ms percentile_50=62.259199ms percentile_75=64.716799ms max=96.796671ms
2024-05-03T01:51:08.919821Z INFO metric key=frame.cpu.nearby_nodes percentile_25=44.892159ms percentile_50=46.301183ms percentile_75=48.201727ms max=77.922303ms
2024-05-03T01:51:08.919942Z INFO metric key=frame.cpu.voxels.draw percentile_25=41.919µs percentile_50=49.215µs percentile_75=58.303µs max=309.759µs
2024-05-03T01:51:08.920214Z INFO metric key=frame.cpu.voxels.node_scan percentile_25=12.656639ms percentile_50=15.261695ms percentile_75=17.022975ms max=29.704191ms
2024-05-03T01:51:08.920345Z INFO metric key=frame.gpu.after_draw percentile_25=91.583µs percentile_50=140.287µs percentile_75=252.415µs max=1.035775ms
2024-05-03T01:51:08.920480Z INFO metric key=frame.gpu.draw percentile_25=189.823µs percentile_50=680.447µs percentile_75=749.055µs max=845.823µs
This is over a 3x improvement. The game feels laggy but playable, especially if the server tick rate is reduced to 10 (since ensure_nearby is not optimized).
I expect that in the future, we will want to work towards a smarter data structure, one that doesn't require us to walk the graph every frame (which involves quite a few floating point operations and random memory access). With enough work, I'm hopeful that we can get Hypermine to be GPU-bottlenecked.
However, for now, I'm satisfied with these performance improvements.
This PR tries to grab some of the low hanging fruit for ways to improve CPU performance for larger render distances. Based on profiling I've done on my end, these are likely the easiest performance improvements we can make, and larger performance improvements would likely require more work.
This PR has two changes to make metrics easier to use and two changes that improve performance:
Metrics:
Performance:
traversal::nearby_nodes
are reused so that they only have to run once per frameAs for what this did, here are some very rough metrics for a render distance of 120:
Before:
After:
This is over a 3x improvement. The game feels laggy but playable, especially if the server tick rate is reduced to 10 (since
ensure_nearby
is not optimized).I expect that in the future, we will want to work towards a smarter data structure, one that doesn't require us to walk the graph every frame (which involves quite a few floating point operations and random memory access). With enough work, I'm hopeful that we can get Hypermine to be GPU-bottlenecked.
However, for now, I'm satisfied with these performance improvements.