Ran out of memory running tests.

dan-da commented 4 months ago

I was running out of mem on a machine with 16Gb ram running 200+ tests with lots of calls to trace!().

heaptrack reported over 6.5Gb leaked by a tracing-subscriber MockWriter that just keeps growing a Vec. This led me to the tracing-test docs, which state:

The macro provided in this crate registers a global default subscriber instead. This subscriber contains a writer which logs into a global static in-memory buffer.

I'm curious:

Why is it writing to an unbounded global buffer to begin with?
Is there some way to disable that behavior?
Are there plans to change this in the future?

dbrgn commented 4 months ago

I guess the buffer could be bounded as an option. Right now this is not configurable.

There are no plans from my side to change this, but if the implementation would be simple, I'd be open for pull requests.

dan-da commented 4 months ago

Why is it writing to an unbounded global buffer to begin with?

Answering my own question, I guess it is because cargo test (test executable) buffers test stdout by default. So it would make sense then that if tracing-test is converting log output to stdout, it would need to do the same/similar. Though I don't pretend to understand the details.

Taking this further though, it seems to me that the output of each test should only need to be buffered until the test ends and we learn if it was successful or not. If successful, then the output can be discarded.

Instead, the present behavior is to buffer everything until all tests complete, whether successful or not.

So @dbrgn, do you think it makes sense and is feasible to buffer each test separately and discard buffers for any test that completes successfully?

I guess the buffer could be bounded as an option

yeah, if my above analysis is correct, then bounding the buffer should only be a last resort. Discarding buffer(s) after each successful test would eliminate the out-of-mem problem, at least for my test-suite.

dbrgn commented 4 months ago

@dan-da I agree that your suggestion makes sense. However, the implementation is tricky.

I see two (or three) approaches:

Having a separate log buffer per test

This would be the nicest solution, because we could simply Unfortunately we must set up a global subscriber (to capture logs from different threads) and tests don't run in separate processes (and potentially in parallel), so we cannot separate subscribers per test.

Having a shared buffer but removing all test-logs at the end of the test

Here the difficulty is that the buffer is currently all-bytes, not lines (because we simply use the SubscriberBuilder::with_writer API). Log lines would need to be removed by line prefix though.

In cases where a lot of logs are present (which is the case when you have high memory pressure) this might be inefficient. On the other hand, current log lookups also convert the buffer to lines and scan through all lines, so the difference might be small in practice. So it might be worth an attempt.

My suggestion:

Create a function similar to logs_with_scope_contain called flush_logs_with_scope
Call this function at the end of every test in the traced_test macro

Custom subscriber

Maybe we could write a custom subscriber (in internal::get_subscriber()) that logs to different buffers depending on the test span. I don't yet know whether that would work, and how easy it would be. It would be the most elegant solution from an API view, but the code would need to be sufficiently maintainable.

For me, this doesn't currently have high priority, but a PR would be welcome.

dan-da commented 4 months ago

thx for the writeup.

I was able to workaround the mem-leak in our codebase by disabling some super-verbose trace logging that isn't really needed. So that solves our immediate issue, and at least now we have an understanding of what's happening.

I don't really have time to make a PR myself at present, but hopefully someone will eventually if we leave this open.

dbrgn / tracing-test