apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.75k stars 1.14k forks source link

Syslog thread-safety #3599

Open fjpanag opened 3 years ago

fjpanag commented 3 years ago

This issue is to discuss some problems observed in thread-safety of syslog, and what would be the best way to approach these issues.

The problem
Currently syslog is not thread-safe in all cases. There are cases were multiple threads logging simultaneously may lead to garbled logger output.

The correct / expected behavior is to have correctly printed log messages, in the correct order, in all configurations and operating conditions of the system.

The problem is expressed at least in the following ways:

Its root cause is that nx_vsyslog() does not execute atomically. It makes multiple unprotected calls to lib_sprintf().
So, if two threads are logging simultaneously, their respective calls to lib_sprintf() will interfere with each other.

There are already some discussions on these matters in #3050, #3157.
There is also some discussion here.

The possible solutions
I can think of two possible solutions to this issue:

Each solution has its prons and cons, and this is why I created this issue; to discuss the best solution so I can proceed implementing it.

Semaphore
This seems to me the simplest and most effective solution.
It will solve the above issues, while also makes it very easy to add and remove syslog channels dynamically.

The argument against it (as discussed here), is that locking the system is a bad idea. It harms the real-time behavior of the system.
Thus, this is how buffering was born. To ensure that no lock was held, and logging was taking place quickly with minimum impact on the system.

However, I disagree with this for a couple of reasons:
Firstly, syslog is a common resource. It is a typical practice, and expected by anyone that a common resource will be somehow protected, and will be inaccessible concurrently.
Relying just on the timing of various system parts to ensure thread-safety is bad practice IMO.

Secondly, if a system has hard real-time requirements and nevertheless uses a resource that is shared, or possibly inaccessible, is broken by design.
When such a system is designed, it must be ensured that it will be predictable and that it does not depend on anything outside its control. Especially when talking about syslog which by definition cannot be predictable (e.g. the log channel may or may not be accessible, it may be in use by other threads, it may fail, its execution duration is not possible to be known etc etc).

Personally when I am in need of logging in threads with very hard real-time requirements (or similar use case), I just have the real-time thread keep its state in some internal flags which are read by a lower priority thread which in turn does the logging without introducing any jitter. That way the real-time thread does not rely on anything external, and does not waste time in logging.

Thirdly, semaphore protection is already there. Even if nx_vsyslog() tries not to lock the system, in many cases the underlying channel drivers does lock it.
When two threads log simultaneously, they have to wait each other to access the driver while calling lib_sprintf(). So we don't actually gain anything, we just fool ourselves.

Buffering This is the currently implemented solution regarding thread-safety.
Each thread has its own log buffer where it stores the log message. This buffer is flushed at a later time atomically.

This solution has the potential to solve all the above thread-safety issues.
It has the advantage that if implemented correctly, it allows all threads to log concurrently, quickly and without any kind of locking.
It also:

However I also see some disadvantages:


So, my personal preference is to go for the "semaphore" solution.
Maybe even completely remove the buffering option if it does not serve any purpose after this fix.

What do you think?
Any advantage/disadvantage for every case that I don't see? A third alternative maybe?

xiaoxiang781216 commented 3 years ago

Its root cause is that nx_vsyslog() does not execute atomically. It makes multiple unprotected calls to lib_sprintf(). So, if two threads are logging simultaneously, their respective calls to lib_sprintf() will interfere with each other.

It can be fixed by va_format extension from linux kernel:

struct va_format

    %pV

For printing struct va_format structures. These contain a format string
and va_list as follows:
    struct va_format {
        const char *fmt;
        va_list *va;
    };
Implements a "recursive vsnprintf".

And here is an implementation I made recently: https://github.com/j256/dmalloc/pull/89/commits/6f4d29da098e985ab284b837eb5e45009f1b45e8#diff-e7a46ef823b649b77cffce4a4a8d4cbe508b35c6ce4fc54ebc39a1d1895ad85cR378-R384 https://github.com/j256/dmalloc/pull/89/commits/6f4d29da098e985ab284b837eb5e45009f1b45e8#diff-f2e6c18de8171e389aeff0b8d25b2755253108a59a13baf7167c2ffa0af36a58R529-R537

Semaphore

semaphore isn't enough here, we need the critical section to sync with the interrupt handle. Since the critical section will block all hardware interrupt process, and the serial is very slow, the lock will make a significant impact on the runtime behaviour.

fjpanag commented 3 years ago

semaphore isn't enough here, we need the critical section to sync with the interrupt handle. Since the critical section will block all hardware interrupt process, and the serial is very slow, the lock will make a significant impact on the runtime behaviour.

Indeed a critical section would be a problem.
But I guess the semaphore isn't needed to be held by the interrupts. The interrupts can continue logging in the int buffer.

The interrupts buffer may remain as is, since there is no nesting for interrupts. Its only the threads that need to be protected.

davids5 commented 3 years ago

@fjpanag just to be clear interrupts can not wait on a semaphore see

fjpanag commented 3 years ago

@fjpanag just to be clear interrupts can not wait on a semaphore see

Yes, I am aware of that.

So the "semaphore" approach would be something like:

if (irq_context)
{
    log_to_int_buffer();
}
else
{
    take_semaphore();
    flush_int_buffer();
    log_to_actual_device();
    release_semaphore();
}

In fact, this is very similar to what is already happening.

protobits commented 3 years ago

@fjpanag just to be clear interrupts can not wait on a semaphore see

That is the reason I think the current design is based around printing each character as it can be done without locking when using a circular buffer. But this is really inefficient for things like UART.

I think the way to go would be to always use an interrupt buffer and use a work queue for logging to "slow channels". On the other hand, since logging to RAMLOG is instantaneous and already provides buffer, a semaphore could be used directly. The issue about breaking on buffer boundaries maybe can be done by flushing in separate lines. If the output is indeed a very large line, it most likely should be split in various lines

protobits commented 3 years ago

@fjpanag just to be clear interrupts can not wait on a semaphore see

Yes, I am aware of that.

So the "semaphore" approach would be something like:

if (irq_context)
{
    log_to_int_buffer();
}
else
{
    take_semaphore();
    flush_int_buffer();
    log_to_actual_device();
    release_semaphore();
}

In fact, this is very similar to what is already happening.

From my suggestion, flush_int_buffer() and log_to_actual_device() should schedule the printing if using slow channel. For RAMLOG it can be done immediately.

fjpanag commented 3 years ago

I think the way to go would be to always use an interrupt buffer and use a work queue for logging to "slow channels".

This is indeed one approach, as mentioned in the original post.

But I am opposed to this, for 3 reasons:

  1. I am of the strong opinion that the logger should be synchronous. It is a great debugging tool, so you must be able to see what is happening, the moment it is happening. Neither at a later time, nor at a different order. Exactly as the code generated it.

  2. Having the logger use a worker thread is not a good idea IMO. It makes the system unpredictable, and affects the real-time behavior. It makes it impossible to tell when the actual access to the device will take place, or when this "heavier" processing will utilize the CPU. A channel driver may lock other resources at an undetermined moment in the future, that affect higher priority threads that need the same resource. If syslog() writes directly to the device, then you know that "now the system has a slow job to do", and design it accordingly.

  3. The syslog will always be used in almost all systems. Lightweight applications though, may not need a worker thread. If the logger requires it, then you force everyone to use a new thread that otherwise may not. And this needs resources that a small system may not be able to spare. Effectively it increases the minimum system footprint for NuttX.

protobits commented 3 years ago

We do not have to choose either approach, it can be left to an option. As we are discussing, both blocking on syslog() and non-blocking (via work queue) have different use cases. I can't find anything the POSIX standard on whether syslog should blocking or non-blocking so it makes sense to have that option.

Something to consider, is to have this option per-channel (on the multi-channel approach). For example, we can make the output to console be immediate (block until printed) but the output to file be non-blocking (schedule the write to the file).

protobits commented 3 years ago

Also, regarding 2., note that it can be done in low-priority worker, so high-priority work is not affected.

protobits commented 3 years ago

One final note, we could also make this more fine-grained by using openlog() (https://pubs.opengroup.org/onlinepubs/000095399/functions/openlog.html) options. syslog() does not require openlog() to be called, but if it is, it could allow for a process to specify how it expects syslog() to work. In particular, from my interpretation, LOG_NOWAIT seems to indicate that one expects syslog() not to block. So we can make use of this to decide whether messages printed via syslog() from a specific task should be sent to a work queue or waited on.

gustavonihei commented 3 years ago

We do not have to choose either approach, it can be left to an option.

I agree on this. We'll probably not find a perfect solution for this problem, as @fjpanag already stated, each one has its pros and cons.

I am of the strong opinion that the logger should be synchronous. It is a great debugging tool, so you must be able to see what is happening, the moment it is happening. Neither at a later time, nor at a different order. Exactly as the code generated it.

Regarding logging, I'd say there are at least two stages: the generation/capture and the publication/notification. I agree that the first stage must be synchronous and must contain a precise context, otherwise it may lose effectiveness. But the publication I am not sure if needs to be handled the same way.

fjpanag commented 3 years ago

We do not have to choose either approach, it can be left to an option.

This is true. We can have both. But since both need some work, and both try to achieve the same goal, my instict is to keep a single well-behaving mechanism, instead of two.

Also, regarding 2., note that it can be done in low-priority worker, so high-priority work is not affected.

This is not true. The priority of the worker is not related. If a worker holds a resource, then it holds it for all other threads, regardless of their priority.
In fact a low priority worker is worse. It can lead to priority inversions, deadlocks etc.

Also, AFAIK using a worker means that we cannot have a network logger. Since both are using the workers, I was told in the mailing list that this may cause issues.

Something to consider, is to have this option per-channel (on the multi-channel approach). For example, we can make the output to console be immediate (block until printed) but the output to file be non-blocking (schedule the write to the file).

Should any logger options be per-channel, or per-task? Do standards dictate anything on this?

One final note, we could also make this more fine-grained by using openlog() (https://pubs.opengroup.org/onlinepubs/000095399/functions/openlog.html) options. syslog() does not require openlog() to be called, but if it is, it could allow for a process to specify how it expects syslog() to work. In particular, from my interpretation, LOG_NOWAIT seems to indicate that one expects syslog() not to block. So we can make use of this to decide whether messages printed via syslog() from a specific task should be sent to a work queue or waited on.

This is a good idea, but do we actually need this level of complexity?
I think this will cause the messages to always be buffered (or at least the buffer to be allocated), regardless whether it is actually to be used, which is a bit wasteful.
I would generally prefer to avoid such a "heavyweight" implementation for the system logger.

Regarding logging, I'd say there are at least two stages: the generation/capture and the publication/notification. I agree that the first stage must be synchronous and must contain a precise context, otherwise it may lose effectiveness. But the publication I am not sure if needs to be handled the same way.

Can you please elaborate on this? An example maybe? I like the general idea, but I would like to see how you have it in mind.

fjpanag commented 3 years ago

Some more thoughts...

Regarding buffering, and especially using a worker. I believe that this is never going to work correclty, for practical reasons.

The volume of the log messages between two consecutive flushes to the physical medium is unknown.
There may be just a single small message, or a huge number of logs to be flushed.

This means that we have two options:

Regarding non-blocking logging.
Actually, what is the advantage of this? I believe none. On the systems that NuttX is running, the CPU time needed for flushing the log messages will always be the same. Regardless of whether you do it "now" or "later", in all cases it will leave the exact same CPU available for your thread. So the "speed-up" of your thread is only apparent, for a small chunk of code. The whole thread will always have the same performance.

The only case that this can be an advantage is if you intend to have the flushing take place in lower priority than the thread, which may indeed provide better performance. But then you risk to generate more messages than it is possible to consume, which will again make the system unstable.

Note that currently NuttX does not support non-blocking logging. It is blocking-only. The buffering is there to only aid in thread-safety.

protobits commented 3 years ago

Regarding non-blocking logging. Actually, what is the advantage of this? I believe none. On the systems that NuttX is running, the CPU time needed for flushing the log messages will always be the same. Regardless of whether you do it "now" or "later", in all cases it will leave the exact same CPU available for your thread. So the "speed-up" of your thread is only apparent, for a small chunk of code. The whole thread will always have the same performance.

I don't think that is true. There's a difference in logging right at the syslog() call and doing it when the log flush operation is scheduled (which means it will be done once other high-priority operations are done). It isn't about it taking less time, it is about being able to log without introducing delays in the calling code by doing so. That is the point of non-blocking.

In other words, if I send logging to SD card from a task that has tight real-time constraints, it would possibly ruin them. At the same time, as I mentioned, each medium may have different "urgencies": I may prefer to see the messages printed immediately to the console/ramlog but I do not need that for SD card logging since it is more for post-mortem analysis.

Maybe as you say a work queue may not be the right approach, but let's consider the requirements for the different use cases before discussing a specific implementation.

fjpanag commented 3 years ago

@v01d with all my respect, I think that you misunderstand the term "real-time".

Real-time means deterministic, not fast.
In fact the opposite is true, many real-time systems are slower due to the guarantees that they have to offer.

In our case, real-time is the blocking interface. The way that the log is to be flushed is always predictable and the same, with no surprises. Of course, to guarantee this, it may have to be slower.

The non-blocking case, could harm real-time behavior. Because the actual flushing will take place at a later point in time that is not known or predictable (at least not within the scope of the thread it-self). When the buffer decides to flush, it may have to lock other resources needed by the high priority thread. This means that the high-priority thread has to wait, possibly missing its deadlines (as the programmer is unable to predict when this is going to happen, to account for this).

This is one of the crucial differences between NuttX and Linux.
If it wasn't this, I would just use Linux, Cortex-As are very cheap nowadays.
Both have some work to do:

gustavonihei commented 3 years ago

Regarding logging, I'd say there are at least two stages: the generation/capture and the publication/notification. I agree that the first stage must be synchronous and must contain a precise context, otherwise it may lose effectiveness. But the publication I am not sure if needs to be handled the same way.

Can you please elaborate on this? An example maybe? I like the general idea, but I would like to see how you have it in mind.

Sure. I'll assume we have a thread dedicated for the publishing of the syslog message with a given priority level. The publication should be asynchronous because this action may be as simple as pushing the message into a RAMLOG buffer or even sending it to a Logstash server via a secure TCP socket (assuming that one day the syslog message format becomes compliant with RFC5424).

For the generation of syslog messages, we would just need to associate them with a timestamp (or another reliable value that may represent the passage of time, e.g. clock ticks) and they'd be pushed to a shared buffer.

Either discard any messages that do not fit in the buffer. Which is bad, as information is lost.

Even so, I believe is that may be the safest solution. That shared buffer could be managed as a circular buffer and it's size should be configured according to the intended use case. One approach to mitigate the loss of important information is to separate syslog messages into different buffers by their priority levels.

fjpanag commented 3 years ago

I got it better now, thanks.

Even so, I believe is that may be the safest solution. That shared buffer could be managed as a circular buffer and it's size should be configured according to the intended use case. One approach to mitigate the loss of important information is to separate syslog messages into different buffers by their priority levels.

However, wouldn't that mean several kB allocated for these buffers? Is this acceptable for the constrained systems that NuttX is running on?

Based on your initial description, I imagined something like a single buffer, passed over to the channels, to "consume" it in whatever way they like (i.e. even non-blocking, discard it due to low severity etc). Not a full-blown queue...

gustavonihei commented 3 years ago

Even so, I believe is that may be the safest solution. That shared buffer could be managed as a circular buffer and it's size should be configured according to the intended use case. One approach to mitigate the loss of important information is to separate syslog messages into different buffers by their priority levels.

However, wouldn't that mean several kB allocated for these buffers? Is this acceptable for the constrained systems that NuttX is running on?

This could be made configurable. I believe that the decision of whether this is acceptable or not rests with the developer of the end application, depending on the available RAM of the target platform.

Based on your initial description, I imagined something like a single buffer, passed over to the channels, to "consume" it in whatever way they like (i.e. even non-blocking, discard it due to low severity etc). Not a full-blown queue...

I suggested a queue for what I believe to be the most straightforward way to manage a buffer that will be shared among several threads generating syslog messages. But I confess I didn't put much thought into this.