Closed edongashi closed 3 years ago
Maybe take only up to X debug messages per second? Under a high load most will be useless anyway...
Also the UI shows messages from a few minutes ago until it catches up. If the logs are constant it will never catch up and the difference will just grow...
Maybe it should just collect the messages and then batch save them once every 500ms or something? Batch inserts are way more efficient.
Batch inserts are way more efficient.
Could be. I'll try to tweak the code to see if we can batch them up and then I'll run the measurements again.
I don't think batching is a complete solution to this issue. It will alleviate the problem a bit, but the system will still put a high load on the database for a non-critical feature. I think the problem is that debug logging is very noisy, so we should ignore debug logs (or assume they will only be activated sporadically) or drop them on high load (which we need to detect somehow).
I agree with @cristianberneanu that batching will not be the fix. It will improve performance but only linearly.
or drop them on high load (which we need to detect somehow).
I suggest sampling X first debug messages for each time slot.
We can store a tuple {min_time, count}
in genserver state. Once count
reaches a threshhold we increment min_time
by a second or similar and reset count to 0. We reject any messages < min_time
.
We can store a tuple {min_time, count} in genserver state. Once count reaches a threshhold we increment min_time by a second or similar and reset count to 0. We reject any messages < min_time.
This will allow some debug message and drop others, making the log unreliable. It would be better in my opinion to drop all debug messages for some period during high load and warn about it instead.
I agree with @cristianberneanu that batching will not be the fix. It will improve performance but only linearly.
It is not a panacea, but it might be more than enough? It might be the difference between 100 log messages inserted per second and saturating the DB and 3000 log message inserted without stressing the DB...
Building overload protection etc seems rather complex.
Ok, I see 2 orthogonal tasks here:
I'm trying to implement the first one, unless Cristian knows any shortcuts.
I'm trying to implement the first one, unless Cristian knows any shortcuts.
You mean a shortcut to do batch updates? Nothing else comes to mind besides gathering logs in a queue and periodically flushing that into the database.
You could do both at the same time if you only flush a limited amount of entries per interval: if the queue grows too big, go into the "high load mode" for x intervals and ignore debug messages.
You could do both at the same time if you only flush a limited amount of entries per interval: if the queue grows too big, go into the "high load mode" for x intervals and ignore debug messages.
Good idea! The Air.Service.Logs
genserver can handle both in state.
if the queue grows too big, go into the "high load mode"
Should we discard debug messages from this already-filled queue?
I think that makes most sense.
Running a load test for a few minutes results in hundreds of thousands of piled up messages in the logs service. From my observations it can only process a few hundred per second. For a long running system this is unacceptable because it can go to millions.
What should we do?