Linaro / OpenCSD

CoreSight trace stream decoder developed openly
https://github.com/Linaro/opencsd/wiki
Other
143 stars 54 forks source link

Multithreaded Decoding #54

Closed albertschulz closed 1 year ago

albertschulz commented 1 year ago

Hi there, is the library in general usable on to split the decoding to multiple threads? Traces are often huge. Using the complete CPU power could reduce the decoding time by the number of true-parallel threads. What would be the usage pattern when using multiple threads?

mikel-armbb commented 1 year ago

Hi,

We haven't tried to use the library components in a multi-threaded decode configuration - though there are no issues I am aware of that prevent this if the individual components are created and used by the client rather than using the current decode tree model.

Trace decode is by its nature a serial operation - see the documentation for the 3 decode phases that we have in the library - frame demux feeds into packet stream processing feeds into trace decode.

However given that we frequently have multiple sources in the same buffer, one approach may be to have a thread per source at the packet processing stage - assuming that demux is a lot faster than the subsequent stages.

For a very large buffer of trace packets from a single source, it may then be further possible to start multiple decode threads at different synchronisation points - however this would have to be carefully resynchronised on the back end - and there is possibly state that crosses sync points to be aware of. After any synchronisation point trace must be decoded serially as compression means that state of the previous packet is important for determining the state in the next packet.

For this to work, as mentioned above, the client would have to initialise and control individual decode components outside of the decode_tree framework, or the library would need to be enhanced to implement some of this functionality. There are no current plans for any such enhancement

Regards

Mike

On Thu, 9 Feb 2023 at 10:49, Albert Schulz @.***> wrote:

Hi there, is the library in general usable on to split the decoding to multiple threads? Traces are often huge. Using the complete CPU power could reduce the decoding time by the number of true-parallel threads. What would be the usage pattern when using multiple threads?

— Reply to this email directly, view it on GitHub https://github.com/Linaro/OpenCSD/issues/54, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADF7BM6CDK3MXD52XR2Y77TWWTDTHANCNFSM6AAAAAAUWMMFRY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK

albertschulz commented 1 year ago

Thank you very much for this detailed discussion. I'll see how I can use these approaches to improve performance. If I continue this way, I would consider bringing it back to the main line of this repository. Best, Albert