Open alindima opened 2 years ago
Related to https://github.com/firecracker-microvm/firecracker/issues/1759. In order to remove the lazy_static
usage we'd need to use per-component metrics. One option for the design is the one linked in the issue.
+1 on each instance of each component to have its own metrics (so one could see how many packets a particular net device 2 has sent, for example).
Unfortunately, on the signal handler topic, this is not a fix, and will actually make things even worse (more constrained).
Currently, there is a issue of potential deadlock on a global metrics mutex, that can be retried in a signal handler even if it is already taken by same thread in normal (non-signal handler) context. Moving the metrics from single global object to multiple individual components ultimately means more individual mutexes that can also deadlock if there is a path to them from signal handler.
E.g.: doing something like vmm.flush_all_metrics()
involves iteratively locking each device to flush their metrics. Calling that in a signal handler results in a high chance of signal coming in while there is some emulation code running under some device's held lock, resulting in guaranteed deadlock.
Safest thing to do is just not flush/print metrics in signal handlers :stuck_out_tongue:
Metrics themselves don't need locks since they're atomics, so a solution would also be to decouple the writer from the metric implementation.
Then you can use different writers for different contexts with no locking required between them. Or use a single global writer wrapped in a reentrant mutex. Or some other custom approach, the point being to outsource the challenge outside of the metrics implementation.
You definitely need to flush metrics, or at least one metric in a signal handler. Is important to not lose critical events as seccomp failures. But one can have multiple write/dump functions. One to be called in normal context where everything is fine, and one for signal handlers where the code is in "emergency" situation.
Also I am a bit worried on the performance impact of atomic metrics. As discussed on the issue mentioned above, atomics for metrics on the hotpaths might add a degradation. Depending on architecture specifics, there we need to think alternatives like per thread metrics.
Metrics themselves don't need locks since they're atomics, so a solution would also be to decouple the writer from the metric implementation.
This is similar to what I was proposing here (actually an idea from @alsrdn):
Problems 1 and 3 can be fully solved by removing metrics usage from the signal handler. One option here is to have a special file used for logging the exit reason and the latest metrics values, similar to a coredump.
The metrics lock is only for writing to the file. If we have another file that is used from the signal handler, then this problem is solved.
It is then a challenge to make the metrics system available to the signal handler without using a global variable (like with lazy_static). If we're keen on doing any metrics flushing from the signal handler I think we're stuck to using some globally accessible object, which may be fine if we solve the thread-safety issues.
In a nutshell, there are two big problems to be solved here:
And indeed the per-component metrics add trouble, unless we have a sophisticated design where each component propagates the metric update to its parent whenever it becomes available (instead triggering it on a specific flush event).
But I don't really see how we can use per-component metrics in the signal handler anyway.
Also I am a bit worried on the performance impact of atomic metrics. As discussed on the issue mentioned above, atomics for metrics on the hotpaths might add a degradation. Depending on architecture specifics, there we need to think alternatives like per thread metrics.
This is indeed a nice approach šš» It would also remove the race condition potential from the metrics serialisation, since at no point could two threads operate on the same metric value
We removed the task for the roadmap because we are currently not planning to work on it due to higher priority tasks. However we split #4709 from this task, which will help to define the first stepping stone toward the more broad refactor proposed in this issue.
The metrics system is not fully thread-safe at the moment, due to some issues:
IncMetrics
inner state is mutated on serialisation. This causes race conditions when thewrite()
function is called from multiple threads. See: https://github.com/firecracker-microvm/firecracker/pull/2893SharedIncMetrics
use atomics, they always useRelaxed
ordering. While on x86 memory access has Acquire-Release semantics, on Arm this is not the case. Hence, the process of writing metrics to file may use outdated values.Problems 1 and 3 can be fully solved by removing metrics usage from the signal handler. One option here is to have a special file used for logging the exit reason and the latest metrics values, similar to a coredump. We should also enforce that
METRICS.write()
is called from a single thread (and therefore removing thelazy_static
declaration).Problem 2 could be solved in two ways: by using tighter ordering constraints (need some further dive deep and may incur some overhead due to CPU reordering constraints and prevention of certain compiler optimisations) or by redesigning the metrics system to use per-thread values (this would also solve problem 1).
Another thing to keep in mind is potential need for having device-specific instances of a metric. For example,
METRICS.net.tx_bytes_count
may have sense to be reported per-device instance instead of being aggregated.