MovingBlocks / Terasology

Terasology - open source voxel world
http://terasology.org
Apache License 2.0
3.69k stars 1.34k forks source link

Analyzing Performance (Documenting Progress, Results, etc.) #5150

Open jdrueckert opened 1 year ago

jdrueckert commented 1 year ago

Assumptions

Next Steps

Collected Insights

Past Insights

Related Issues

Concurrency Providers & Consumers

(current state as created by @skaldarnar for Reactor effort) image

Time-consuming tasks

(as compiled by @DarkWeird): Many time takes:

  1. generating/loading chunks.
  2. Exposure node
  3. Shadow map.
  4. Nui

Many memory takes chunks... but we cannot shrink them almost. Bytewise operation take many time, any object structure (like octotree) take so huge memory.. that current impl is optimal. Java modules can enable agressive optimization if we hide it in separate module (or cannot). Also octotree can be more optimal by memory, when java implement compact class header. (Or we hide chunk in rust)

Most frequently called methods

(as compiled by @BenjaminAmos via JFR)

I don't know if this is right but a quick JFR recording seems to indicate void org.terasology.core.world.generator.facetProviders.DensityNoiseProvider.process(GeneratingRegion, float) as being called an awful lot (24% of the time). That doesn't necesarily mean that it's a bottleneck though (sampling does not measure execution time).

Interestingly, on the slow server recording the most frequently sampled methods were:

The HashIterator method was generally (indirectly) called from:

Actually, it's those systems for both frequent methods. Inside of those methods, the stack generally goes:

DensityNoiseProvider is not as big of an issue on that machine though. It's only 1.52% of samples. Could it possibly be related to https://github.com/MovingBlocks/Terasology/blob/f907533ede16322d2b6916947f27917a311b61b0/engine/src/main/java/org/terasology/engine/entitySystem/entity/internal/PojoEntityManager.java#L253-L260 or https://github.com/MovingBlocks/Terasology/blob/f907533ede16322d2b6916947f27917a311b61b0/engine/src/main/java/org/terasology/engine/entitySystem/entity/internal/PojoEntityPool.java#L287-L300 - This does use Java streams.

Multi-Threading

@BenjaminAmos found the following list of threads indicated by JFR (threads marked with '*' are assumed to be "ours"):

C1
C2
*Chunk-Processing-0
*Chunk-Processing-Reactor
*Chunk-Unloader-0
*Chunk-Unloader-1
*Chunk-Unloader-2
*Chunk-Unloader-3
Common-Cleaner
FileSystemWatchService
FileSystemWatchService
Finalizer
G1
Java2D
JFR
JFR
JFR
JFR:
Logging-Cleaner
*main
nioEventLoopGroup-2-1
nioEventLoopGroup-3-1
nioEventLoopGroup-3-2
nioEventLoopGroup-3-3
Reference
*Saving-0
Service
Signal
SIGTERM
StreamCloser
Sweeper
*Thread
*Thread-1
*Thread-2
VM

Code Areas with Longest Per-Call Durations

Based on the statistical info in https://benjaminamos.github.io/TerasologyPerformanceTracyView/tracy-profiler.html

TODO: Refactor the individual code areas to improve their performance and reduce their per-call run time.

References

Reactor Effort:

Potentially Helpful Tooling

Information Sources

Performance-related issues:

Tooling-related issues:

Follow-Up Actions

BenjaminAmos commented 1 year ago

These methods appear to be consuming significant quantities of time (these are the highest execution times recorded). Results may be skewed by threading or other interruptions (such as GC collections).

Method Max Duration (ms)
AbstractStorageManager::loadCompressedChunk 595
LocalChunkProvider::processReadyChunk (for all chunks) 379
rendering/CoreRendering:worldReflectionNode 291
rendering/CoreRendering:opaqueObjectsNode (MeshRenderer) 165
LocalChunkProvider::loadChunkStore 106
GameThread::processWaitingProcesses 42
BenjaminAmos commented 8 months ago

Reducing the number of chunk threads appears to help with AbstractStorageManager::loadCompressedChunk.

LocalChunkProvider::processReadyChunk (per-chunk) behaves somewhat unpredictably in terms of performance. At times, a single call can take up to 150ms. At other times, the runtime is negligible.

BenjaminAmos commented 7 months ago

VoxelWorldSystem::onNewChunk is also quite expensive to run, it would appear.

jdrueckert commented 7 months ago

@BenjaminAmos Added the 10 longest per-call duration finds (with move than 1000 calls) of the trace to the issue description.

jdrueckert commented 5 months ago

During today's playtest, we identified a few areas to look into in more detail: