COVESA / dlt-daemon

Diagnostic Log and Trace.
https://covesa.github.io/dlt-daemon/
Mozilla Public License 2.0
356 stars 284 forks source link

Trace budgeting mechanism #643

Open svlad-90 opened 1 week ago

svlad-90 commented 1 week ago

I'm interested in fighting trace spam risks for my current project when one domain within the complex automotive system is 'eating up' much of the dlt-daemon logging bandwidth.

I considered investigating whether extending the number of messages that dlt-daemon can process per second is possible. In my current environment, it is ~5000 messages/second, after which the dlt-daemon drops the messages with quite a significant CPU load. After my investigation, I found that improving this significantly is impossible. Also, I remember the best practices and that dlt-daemon is not intended for heavy tracing of the low-level data.

The other way is to have a per-application and ( or ) context ID trace budgeting mechanism to suppress trace spamming processes/contexts.

I've seen the following non-merged PR: https://github.com/COVESA/dlt-daemon/pull/134

So, I'm not the only one who wanted such a feature. But it was not merged; thus, before starting development, I want to cross-check with maintainers the following points:

I am looking forward to getting your feedback! ))

minminlittleshrimp commented 1 week ago

Hello @svlad-90 It is nice of you to raise your concern and your interest to DLT.

For your proposal, IMHO, I am okay with the feature, the only thing we need to worry about is making sure that the implementation will not affect the current mechanism, APIs, or violating AUTOSAR Standard/specification, and, not breaking any unittest for current features, etc I can do the validation, testing and checking for your implementation later in review phase. You can go ahead with the diagrams, mechanisms, PRs and do not worry at all, we will support you, since the last PR is closed due to the author's account inactive, and we cannot process if the contributor dropping that way. For dlt-trace-load.conf , honestly I have no idea what this file is and for 😀 Maybe you right about this is from some commercial version from some partners in the alliance.

About this point:

If you approve of implementing it, would it be OK if I create some architectural diagrams and post them to this thread to align with maintainers on the possible implementation? I want to implement it properly right away, not to spend my and your time on endless reviews.

I also not touching much on DLT tracing, just the logging, so it's fine for me to involve in this topic. I have no objection, let's work together. Looking forward to your response!

svlad-90 commented 1 week ago

Hi @minminlittleshrimp,

Thank you very much for your feedback and for being ready to collaborate!

As I'm working on a customer's project in my company, I'll need to plan these activities properly with my management. So, for your information, it might take 1-4 weeks until this task is part of the sprint, and I'm finally back with the diagrams.

But this feature seems crucial for our customer, who has chosen to use DLT as part of its technology stack, so there is a low chance that we will abandon it. ))