WernerMairl / protobuf-net-concurrency

Some investigations about protobuf perf with higher degree of parallelism
MIT License
0 stars 0 forks source link

Missing performance in high concurrency scenarios (protobuf-net deserialization) #2

Open WernerMairl opened 3 months ago

WernerMairl commented 3 months ago

Issue and Symptoms

Doing a higher number of protobuf deserializations, I'm not able to improve performance using more tasks/threads!

I have written a smallest sample that should help to isolate the problem. It is a console program, so it can be executed and also some profiling tool can be used. The output on my machine looks like the following:

image

We see on the green rows: adding more Tasks/Threads reduces the deserialization rate inside all threads. At the end there is basically no win.

In numbers: 4.9 seconds with 1 thread, 3.3 seconds with 8 threads. My expectation for 8 threads in this scenario: worst case less then 2 seconds, best case less then one seconds (not talking about milliseconds at this time).

Also the Taskmanager shows that the CPU Power is not used.

What I'm doing wrong ?

image

About the example

A proto file and generated c# types are part of this repo (MIT licensed). All of them are coming from the OsmSharp OpenStreetMap file format (MIT licensed)

Any idea, any help is welcome ;-)

BR Werner

WernerMairl commented 3 months ago

I did a lot of investigations in 2022 for the same here and it looks like the memory conclusions are true.

I have modified the example here to use only 10 nodes inside the data (4000 in the past, and 8000 in reality), and I can see the following result:

image

Interpretations: between concurrency==1 and concurrency==4 we see likely perfect usage of CPU: overall duration decreases from 7174ms to 2174 ms.

And also with a concurrency of 8 we get a better result then with 4, but not with the best ratio.

On the other side, using 8000 Nodes inside a single deserialization like Osm does in reality shows the following:

image

we get the best (most efficient) result with a concurrency of 2. Using higher concurrency gives the same results, but costs a lot of overhead.