Open jdrueckert opened 1 year ago
These methods appear to be consuming significant quantities of time (these are the highest execution times recorded). Results may be skewed by threading or other interruptions (such as GC collections).
Method | Max Duration (ms) |
---|---|
AbstractStorageManager::loadCompressedChunk |
595 |
LocalChunkProvider::processReadyChunk (for all chunks) |
379 |
rendering/CoreRendering:worldReflectionNode |
291 |
rendering/CoreRendering:opaqueObjectsNode (MeshRenderer ) |
165 |
LocalChunkProvider::loadChunkStore |
106 |
GameThread::processWaitingProcesses |
42 |
Reducing the number of chunk threads appears to help with AbstractStorageManager::loadCompressedChunk
.
LocalChunkProvider::processReadyChunk
(per-chunk) behaves somewhat unpredictably in terms of performance. At times, a single call can take up to 150ms. At other times, the runtime is negligible.
VoxelWorldSystem::onNewChunk
is also quite expensive to run, it would appear.
@BenjaminAmos Added the 10 longest per-call duration finds (with move than 1000 calls) of the trace to the issue description.
During today's playtest, we identified a few areas to look into in more detail:
ServerCharacterPredictionSystem
- claiming to have "Received too much input"CharacterSystem
- claiming that "Player xyz seems to have cheated: It stated that it performed an action from ... but the predicted position is ..."LocalChunkProvider
and friends - stalling until flushing something, claiming "Chunk processing took too long this tick"ChunkEventErrorLogger
- claiming "Multiple laods of chunk ..." in client logs
Assumptions
Next Steps
Collected Insights
Past Insights
Related Issues
Concurrency Providers & Consumers
(current state as created by @skaldarnar for Reactor effort)
Time-consuming tasks
(as compiled by @DarkWeird): Many time takes:
Many memory takes chunks... but we cannot shrink them almost. Bytewise operation take many time, any object structure (like octotree) take so huge memory.. that current impl is optimal. Java modules can enable agressive optimization if we hide it in separate module (or cannot). Also octotree can be more optimal by memory, when java implement compact class header. (Or we hide chunk in rust)
Most frequently called methods
(as compiled by @BenjaminAmos via JFR)
I don't know if this is right but a quick JFR recording seems to indicate
void org.terasology.core.world.generator.facetProviders.DensityNoiseProvider.process(GeneratingRegion, float)
as being called an awful lot (24% of the time). That doesn't necesarily mean that it's a bottleneck though (sampling does not measure execution time).Interestingly, on the slow server recording the most frequently sampled methods were:
com.google.common.collect.MapMakerInternalMap$HashIterator.nextInTable()
java.lang.invoke.VarHandleObjects$Array.getVolatile(VarHandleObjects$Array, Object, int)
The HashIterator method was generally (indirectly) called from:
void org.terasology.engine.rendering.logic.LightFadeSystem.update(float)
void org.terasology.engine.logic.behavior.BehaviorSystem.update(float)
void org.terasology.engine.logic.characters.CharacterSystem.update(float)
void org.terasology.engine.logic.common.lifespan.LifespanSystem.update(float)
void org.terasology.engine.logic.behavior.CollectiveBehaviorSystem.update(float)
Actually, it's those systems for both frequent methods. Inside of those methods, the stack generally goes:
boolean com.google.common.collect.Iterators$ConcatenatedIterator.hasNext()
boolean java.util.Spliterators$1Adapter.hasNext()
boolean java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Consumer)
Which implicates java.util.stream again (a known big issue in general).DensityNoiseProvider is not as big of an issue on that machine though. It's only 1.52% of samples. Could it possibly be related to https://github.com/MovingBlocks/Terasology/blob/f907533ede16322d2b6916947f27917a311b61b0/engine/src/main/java/org/terasology/engine/entitySystem/entity/internal/PojoEntityManager.java#L253-L260 or https://github.com/MovingBlocks/Terasology/blob/f907533ede16322d2b6916947f27917a311b61b0/engine/src/main/java/org/terasology/engine/entitySystem/entity/internal/PojoEntityPool.java#L287-L300 - This does use Java streams.
Multi-Threading
@BenjaminAmos found the following list of threads indicated by JFR (threads marked with '*' are assumed to be "ours"):
Code Areas with Longest Per-Call Durations
Based on the statistical info in https://benjaminamos.github.io/TerasologyPerformanceTracyView/tracy-profiler.html
TODO: Refactor the individual code areas to improve their performance and reduce their per-call run time.
LwjglGraphics.java:91
(4.4ms per call)WorldRendererImpl.java:343
(3.83ms per call)VoxelWorldSystem.java:91
(2.71ms per call)StateIngame.java:247
(2.14ms per call)LocalChunkProvider.java:179
(1.08ms per call)GameThreat.java:78
(0.85ms per call)TerasologyEngine.java:506
(0.76ms per call)TerasologyEngine.java:521
(0.60ms per call)LwjglGraphics.java:88
(0.54ms per call)ShadowMapNode.java:156
(0.46ms per call)References
Reactor Effort:
Potentially Helpful Tooling
F3
to open,F4
to cycle through individual tools)Monitoring
option inSettings->Autoconfig->System Settings
to show this information in a separate window (requires restarting the game). ThePerformance
tab will only function when the in-game performance monitor is open (F3
) andF4
has been pressed once.Information Sources
Performance-related issues:
4877
4393
4154
3996
2461
1929
Tooling-related issues:
692
3572
4637
Follow-Up Actions