Open WernerMairl opened 3 months ago
I did a lot of investigations in 2022 for the same here and it looks like the memory conclusions are true.
I have modified the example here to use only 10 nodes inside the data (4000 in the past, and 8000 in reality), and I can see the following result:
Interpretations: between concurrency==1 and concurrency==4 we see likely perfect usage of CPU: overall duration decreases from 7174ms to 2174 ms.
And also with a concurrency of 8 we get a better result then with 4, but not with the best ratio.
On the other side, using 8000 Nodes inside a single deserialization like Osm does in reality shows the following:
we get the best (most efficient) result with a concurrency of 2. Using higher concurrency gives the same results, but costs a lot of overhead.
Issue and Symptoms
Doing a higher number of protobuf deserializations, I'm not able to improve performance using more tasks/threads!
I have written a smallest sample that should help to isolate the problem. It is a console program, so it can be executed and also some profiling tool can be used. The output on my machine looks like the following:
We see on the green rows: adding more Tasks/Threads reduces the deserialization rate inside all threads. At the end there is basically no win.
In numbers: 4.9 seconds with 1 thread, 3.3 seconds with 8 threads. My expectation for 8 threads in this scenario: worst case less then 2 seconds, best case less then one seconds (not talking about milliseconds at this time).
Also the Taskmanager shows that the CPU Power is not used.
What I'm doing wrong ?
About the example
A proto file and generated c# types are part of this repo (MIT licensed). All of them are coming from the OsmSharp OpenStreetMap file format (MIT licensed)
Any idea, any help is welcome ;-)
BR Werner