apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.75k stars 1.14k forks source link

esp32: syslog messages from boot interfere with other syslog messages #3157

Open protobits opened 3 years ago

protobits commented 3 years ago

If I enable RAMLOG, right after boot I see this:

nsh> dmesg
[    0.000000] Idle Task: esp32_rng_initialize: Initializing RNG
[    0.030000] init: I (17) wifi:[    0.030000] init: wifi driver task: 4, prio:253, stack:6656, core=0[    0.030000] init: 
[    0.080000] wifi: I (69) wifi:[    0.080000] wifi: wifi firmware version: b9e8721[    0.080000] wifi: 
[    0.080000] wifi: I (69) wifi:[    0.080000] wifi: wifi certification version: v7.0[    0.080000] wifi: 
[    0.080000] wifi: I (70) wifi:[    0.080000] wifi: config NVS flash: enabled[    0.080000] wifi: 
[    0.080000] wifi: I (70) wifi:[    0.080000] wifi: config nano formating: disabled[    0.080000] wifi: 
[    0.080000] wifi: I (70) wifi:[    0.080000] wifi: Init data frame dynamic rx buffer num: 32[    0.080000] wifi: 
[    0.080000] wifi: I (71) wifi:[    0.080000] wifi: Init management frame dynamic rx buffer num: 32[    0.080000] wifi: 
[    0.080000] wifi: I (71) wifi:[    0.080000] wifi: Init management short buffer num: 32[    0.080000] wifi: 
[    0.080000] wifi: I (71) wifi:[    0.080000] wifi: Init nsh> 

Note I have enabled INTBUFFER.

Ouss4 commented 3 years ago

Is this with no CONFIG_DEBUG* enabled? There were some info messages being printed in the past but now they are all conditioned on DEBUG_WIRELESS_INFO. Is this printed on the serial console or only on a RAMLOG? Another case could be the second stage bootloader. We can build one with debug info completely removed (so you won't even see those messages at startup).

protobits commented 3 years ago

I actually see this after booting as well:

nsh> dmesg
[   15.640000] wapi: esp_wifi_sta_essid: ssid= len=33
[   15.640000] wapi: esp_wifi_sta_essid: WiFi station ssid=v01dalesita len=11
[   44.060000] wapi: wlan_ioctl: flags: 1
[   44.060000] wapi: esp_wifi_sta_essid: ssid=v01dalesita len=11
[   44.060000] wapi: esp_wifi_sta_essid: WiFi station ssid=v01dalesita len=11
[   44.180000] wifi: I (44175) wifi:[   44.180000] wifi: new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1[   44.180000] wifi: 
[   44.190000] wifi: I (44177) wifi:[   44.190000] wifi: state: init -> auth (b0)[   44.190000] wifi: 
[   44.190000] wifi: I (44184) wifi:[   44.190000] wifi: state: auth -> assoc (0)[   44.190000] wifi: 
[   44.190000] wifi: I (44186) wifi:[   44.190000] wifi: state: assoc -> init (4a0)[   44.190000] wifi: 
[   44.190000] wifi: I (44187) wifi:[   44.190000] wifi: new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1[   44.200000] wifi: 
[   44.200000] wifi: esp_event_post: Event: base=WIFI_EVENT id=5 data=0x3ffe5770 data_size=40 ticks=4294967295
[   44.200000] nsh> dmesg

Yes, I have enabled DEBUG_WIRELESS_INFO. This is on the RAMLOG (you can see I'm invoking dmesg). My guess is that there's unprotected concurrent access to syslog. Could syslog be called from "outside OS" within closed drivers? I couldn't find where this "I" or "W" is in the code.

yamt commented 3 years ago

On Wed, Mar 24, 2021 at 7:14 AM Matias N. @.***> wrote:

I actually see this after booting as well:

nsh> dmesg [ 15.640000] wapi: esp_wifi_sta_essid: ssid= len=33 [ 15.640000] wapi: esp_wifi_sta_essid: WiFi station ssid=v01dalesita len=11 [ 44.060000] wapi: wlan_ioctl: flags: 1 [ 44.060000] wapi: esp_wifi_sta_essid: ssid=v01dalesita len=11 [ 44.060000] wapi: esp_wifi_sta_essid: WiFi station ssid=v01dalesita len=11 [ 44.180000] wifi: I (44175) wifi:[ 44.180000] wifi: new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1[ 44.180000] wifi: [ 44.190000] wifi: I (44177) wifi:[ 44.190000] wifi: state: init -> auth (b0)[ 44.190000] wifi: [ 44.190000] wifi: I (44184) wifi:[ 44.190000] wifi: state: auth -> assoc (0)[ 44.190000] wifi: [ 44.190000] wifi: I (44186) wifi:[ 44.190000] wifi: state: assoc -> init (4a0)[ 44.190000] wifi: [ 44.190000] wifi: I (44187) wifi:[ 44.190000] wifi: new:<11,0>, old:<11,0>, ap:<255,255>, sta:<11,0>, prof:1[ 44.200000] wifi: [ 44.200000] wifi: esp_event_post: Event: base=WIFI_EVENT id=5 data=0x3ffe5770 data_size=40 ticks=4294967295 [ 44.200000] nsh> dmesg

Yes, I have enabled DEBUG_WIRELESS_INFO. This is on the RAMLOG (you can see I'm invoking dmesg). My guess is that there's unprotected concurrent access to syslog. Could syslog be called from "outside OS" within closed drivers? I couldn't find where this "I" or "W" is in the code.

these messages are from the binary driver. (arch/xtensa/src/esp32/esp-wireless-drivers-3rdparty) some of them are enabled with CONFIG_DEBUG_WIRELESS_INFO. see esp32_wifi_adapter.c.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fjpanag commented 3 years ago

@v01d Can you please check if this is related to the discussion here?

protobits commented 3 years ago

@fjpanag it doesn't seem so, I didn't pull those changes yet. But something similar to your example case may be happening as there's clearly interleaved output on syslog.

@yamt I see. Are those calls made from a safe context? Do you think this is an issue for ESP32 or is there something to be considered on syslog side?

fjpanag commented 3 years ago

@v01d The issue is not related to the PR. It was there before.
It just happened that I discovered it during working on this PR.

protobits commented 3 years ago

@v01d The issue is not related to the PR. It was there before. It just happened that I discovered it during working on this PR.

I understand. I also get the impression that the interrupt buffer protects interrupts from application calls to syslog but it seems applications are not protected from each other from interleaving. Do you think that is the case?

fjpanag commented 3 years ago

I think that the interrupt buffer behavior largely depends on your configuration.
It is flushed differently, depending whether syslog_putc() or syslog_write() is used, and it may not be safe in all cases.
I haven't examined it carefully though...

On the other hand, the applications are definitely not protected, and the order that the prints will happen is random.
Again, depending on the configuration the behavior may be slightly different, but the issue the same.

I think that two things may be needed here:

  1. Flush the interrupt buffer only once, and before any (thread) print. This will solve your issue, and generally the interrupts will be OK I think (of course as long as NuttX does not nest its interrupts).
  2. Protect the syslog() itself. So every log message print is executed atomically.
protobits commented 3 years ago

I found a relevant lengthy discussion here: https://nuttx.yahoogroups.narkive.com/1ZmaOGpz/patch-syslog-prevent-syslog-messages-from-parallel-processes-getting-interleaved-1-attachment There is a mention about protecting syslog calls killing realtime performance, and a intermediate buffer being a better solution. Since this is what is being done for interrupts, maybe it can be extended to tasks? I don't know what is really preferable in this case (time vs memory performance).

fjpanag commented 3 years ago

I had a look at this discussion. Thank you.

Some comments:

I agree that syslog should not affect the real-time performance of the system in any way.
That is:

On the other hand, I do not see the problem using a mutex-like object.
Syslog is a resource, that is shared between multiple tasks. Of course it has to be locked to ensure thread-safety.
An of course this means that one thread may lock another thread. But this is expected. This is what happens every time that a common resource is shared between tasks (e.g. I2C bus, used by multiple tasks). It is up to the programmer to ensure that such kind of locking will not harm their application, and that the induced jitter is acceptable.

Nevertheless, it is the expected behavior. Knowing nothing about a new OS, I would directly assume that a using the same resource in multiple places will cause jitter. I wouldn't even bother look up any documentation to confirm this.

Regarding buffering, I would say that it seems like a good idea.
As mentioned in the above thread, it also enhances performance which is great.
But:

fjpanag commented 3 years ago

The buffering approach, though, may have another advantage:

Now that we have multiple syslog channels, we may want to have different settings for each channel (e.g. debug level for the serial and warning level for the file).

I was thinking of how this could be implemented (although it is not to my current plans).

Maybe if we pass the buffer directly to each channel, along with any relevant information (like severity), and have the channel itself to decide whether to print the message and how, ensure that it will happen atomically etc etc...

protobits commented 3 years ago

I agree that some form of locking is required. What I understand from that discussion is that the buffer minimizes the time this lock is held (in contrast to holding the lock while output to serial is being done, which is quite slow).

That said, I don't really know how to help on this either, as I have not looked the implementation of syslog and all its complexities. Also, I think that some outputs may require buffering but others not, as RAMLOG is already an output that already handles buffering itself.

Now that we have multiple syslog channels, we may want to have different settings for each channel (e.g. debug level for the serial and warning level for the file).

That sounds interesting but I think we should focus on solving this before adding more features. In any case, I think that would be outside of the normal interface, since setlogmask for example would applies to all channels.

fjpanag commented 3 years ago

That sounds interesting but I think we should focus on solving this before adding more features. In any case, I think that would be outside of the normal interface, since setlogmask for example would applies to all channels.

I am just mentioning this to help us decide the best approach. Of course, the original issue must be solved first.

One question. As @xiaoxiang781216 commented, a semaphore cannot be used in interrupt contex, while the syslog can.
Thus a semaphore is inappropriate.

Is there any similar facility in NuttX that can perform as mutex, it is safe in both thread and irq contexts, and does not cause jitter dead-locks etc as I mentioned above?

protobits commented 3 years ago

But aren't interrupts already handled correctly w.r.t. logging from tasks? I understand the syslog buffer already takes care of that. Furthermore, if each syslog() call made from tasks uses a separate buffer, we only need to protect the flushing of this buffer which I understand is done outside interrupt context (so it is serialized).

About the other issue about being split due to IOB size, I don't have any clues on that. Again, I haven't looked syslog implementation in detail.

fjpanag commented 3 years ago

Oh, there is another issue. I followed the code, specifically for the case that buffering is enabled.

Regarding Thread A - Thread B
Buffering here is broken. Although each thread is using its own buffer, flushing of the buffer is not guaranteed to happen atomically. If the channel does not support a write function, the logger falls back to putc which again produces bad output. (If it does support write, it is up to the channel driver to do it correctly). I was able to verify this experimentally, having garbled logging even with buffering enabled.

Regarding Thread - IRQ As I see it now, there shouldn't be an issue here.
Can you reproduce your issue with CONFIG_SYSLOG_BUFFER enabled?

fjpanag commented 3 years ago

Aaaand.... Another issue...
I haven't reproduced it, but seems theoretically possible.

With buffering enabled.

  1. Thread writes to buffer.
  2. Interrupt kicks in and writes to int buffer.
  3. Thread resumes. Flushes int buffer first, and then the thread buffer.

This is wrong. In reality the thread was the first to perform the call to syslog(), while the interrupt is the second.
But in the log it appears in reverse order (first the IRQ meessage and then the thread's).
Syslog shall respect the order that it is requested to log messages.

protobits commented 3 years ago

Oh, there is another issue. I followed the code, specifically for the case that buffering is enabled.

Regarding Thread A - Thread B Buffering here is broken. Although each thread is using its own buffer, flushing of the buffer is not guaranteed to happen atomically. If the channel does not support a write function, the logger falls back to putc which again produces bad output. (If it does support write, it is up to the channel driver to do it correctly). I was able to verify this experimentally, having garbled logging even with buffering enabled.

I think putc is supported to print messages during boot, but I'm not sure.

Regarding Thread - IRQ As I see it now, there shouldn't be an issue here. Can you reproduce your issue with CONFIG_SYSLOG_BUFFER enabled?

Ah, I didn't realize there are two options: INTBUFFER and BUFFER. I only had the former. It appears it depends on SYSLOG_WRITE which is selected by other options. Still, I'm thinking that for RAMLOG output we shouldn't need further buffering. Maybe the locking should be inside RAMLOG in this case, and that should be enough. For other syslog channels, syslog buffering should be used.

protobits commented 3 years ago

Ok, after looking at the code I understand that "default syslog" is actually ver low level and choosing console output for syslog enables SYSLOG_WRITE support while at the same time allowing me to enable SYSLOG_BUFFER (which I did). However, output is still garbled:

[    0.000000] Idle Task: esp32_rng_initialize: Initializing RNG
[    0.030000] init: I (17) wifi:[    0.030000] init: wifi driver task: 4, prio:253, stack:6656, core=0[    0.030000] init: 
[    0.130000] wifi: I (125) wifi:[    0.130000] wifi: wifi firmware version: b9e8721[    0.130000] wifi: 
[    0.130000] wifi: I (126) wifi:[    0.130000] wifi: wifi certification version: v7.0[    0.130000] wifi: 
[    0.130000] wifi: I (126) wifi:[    0.130000] wifi: config NVS flash: enabled[    0.130000] wifi: 
[    0.130000] wifi: I (127) wifi:[    0.130000] wifi: config nano formating: disabled[    0.150000] wifi: 
[    0.150000] wifi: I (138) wifi:[    0.150000] wifi: Init data frame dynamic rx buffer num: 32[    0.160000] wifi: 
[    0.160000] wifi: I (150) wifi:[    0.160000] wifi: Init management frame dynamic rx buffer num: 32[    0.160000] wifi: 
[    0.170000] wifi: I (161) wifi:[    0.170000] wifi: Init management short buffer num: 32[    0.170000] wifi: 
[    0.180000] wifi: I (172) wifi:[    0.180000] wifi: Init dynamic tx buffer num: 32[    0.180000] wifi: 
[    0.180000] wifi: I (173) wifi:[    0.190000] wifi: Init static rx buffer size: 1600[    0.190000] wifi: 
[    0.190000] wifi: I (184) wifi:[    0.190000] wifi: Init static rx buffer num: 10[    0.200000] wifi: 
[    0.200000] wifi: I (195) wifi:[    0.200000] wifi: Init dynamic rx buffer num: 32[    0.210000] wifi: 
[    0.220000] init: esp_wifi_adapter_init: INFO: OK to initialize WiFi adapter
[    0.220000] wifi: phy_version: 4500, 0cd6843, Sep 17 2020, 15:37:07, 0, 2
[    0.350000] wifi: wifi_set_intr: cpu_no=0, intr_source=0, intr_num=0, intr_prio=1
[    0.360000] wifi: esp_set_isr: INFO: n=0 f=0x40081f94 arg=0 irq=4
[    0.370000] wifi: esp32_ints_on: INFO mask=00000001 irq=0
[    0.380000] wifi: I (369) wifi:[    0.380000] wifi: mode : sta (a4:cf:12:1b:4c:3c)[    0.380000] wifi: 
[    0.380000] wifi: I (370) wifi:[    0.390000] wifi: enable tsf[    0.390000] wifi: 
[    0.390000] wifi: esp_event_post: Event: base=WIFI_EVENT id=2 data=0 data_size=0 ticks=4294967295
[    0.400000] init: esp_wifi_sta_start: INFO: OK to start WiFi station

NuttShell (NSH) NuttX-10.0.1
nsh> [    0.400000] lpwork: esp_evt_work_cb: INFO: WiFi sta start
[    0.410000] wifi: I (406) wifi:[    0.410000] wifi: Set ps type: 0
[    0.410000] wifi: 

So, I'm not following where the issue is. Maybe it is due to esp32 not supporting DMA for UART and thus transfers are character based?

On the other hand, I'm looking at RAMLOG and it seems it only exposes putc and does not use the write facility. Maybe this should also be changed so that SYSLOG_WRITE is available when RAMLOG is used.

protobits commented 3 years ago

Ok, I resorted to debugging and found out that:

  1. esp_log_writev is being called multiple times to put together a message. this means that for each part of the message the timestamp and other stuff being prefixed by syslog is being repeated.
  2. esp_log_writev does not appear to be ever adding a newline

@Ouss4 @yamt could this be fixed on ESP32 side? syslog should be called only once per message and the newline should be sent as part of that

protobits commented 3 years ago

@Ouss4 sorry to ping again but I was wondering if you know about this. It seems that ESP32 internal libs make two calls to syslog internally, one that prints something like I (182) and another one with the message itself (besides the fact that there's no newline in the second either). There should be only one syslog call with a newline. If possible, the first call should simply be removed since there's no point in printing the letter and unidentified number.

Ouss4 commented 3 years ago

@v01d sorry I missed this one. @cwespressif @donghengqaz do you have any insights on this?

donghengqaz commented 3 years ago

I enable RAMLOG and INTBUFFER, but don't catch the same issue with you, as I know the log “wifi:state: xxx -> xxx” is in one task, so they will not mix each other, following is my log:

NuttShell (NSH) NuttX-10.0.1
esp32_rng_initialize: Initializing RNG
I (21) wifi:wifi driver task: 4, prio:253, stack:6656, core=0
I (99) wifi:wifi firmware version: b9e8721
I (100) wifi:wifi certification version: v7.0
I (100) wifi:config NVS flash: enabled
I (100) wifi:config nano formating: disabled
I (100) wifi:Init data frame dynamic rx buffer num: 32
I (100) wifi:Init management frame dynamic rx buffer num: 32
I (100) wifi:Init management short buffer num: 32
I (100) wifi:Init dynamic tx buffer num: 32
I (101) wifi:Init static rx buffer size: 1600
I (101) wifi:Init static rx buffer num: 10
I (101) wifi:Init dynamic rx buffer num: 32
esp_wifi_adapter_init: OK to initialize WiFi adapter
phy_version: 4500, 0cd6843, Sep 17 2020, 15:37:07, 0, 2
wifi_set_intr: cpu_no=0, intr_source=0, intr_num=0, intr_prio=1
esp_set_isr: n=0 f=0x40081f74 arg=0 irq=4
0x40081f74: wDev_ProcessFiq at ??:?

esp32_ints_on: INFO mask=00000001 irq=0
I (355) wifi:mode : sta (c4:4f:33:24:66:45)
I (355) wifi:enable tsf
esp_event_post: Event: base=WIFI_EVENT id=2 data=0 data_size=0 ticks=4294967295
esp_wifi_sta_start: OK to start WiFi station
esp_evt_work_cb: WiFi sta start
I (357) wifi:Set ps type: 0

esp_wifi_sta_password: WiFi station password=espressif002 len=12
esp_wifi_sta_essid: WiFi station ssid=AFAST_IK len=8
I (6391) wifi:new:<13,2>, old:<1,0>, ap:<255,255>, sta:<13,2>, prof:1
I (6392) wifi:state: init -> auth (b0)
I (6397) wifi:state: auth -> assoc (0)
I (6406) wifi:state: assoc -> run (10)
W (6691) wifi:<ba-add>idx:0 (ifx:0, 8c:ab:8e:bb:82:08), tid:0, ssn:2, winSize:64
I (6693) wifi:connected with AFAST_IK, aid = 1, channel 13, 40D, bssid = 8c:ab:8e:bb:82:08
I (6693) wifi:security: WPA2-PSK, phy: bgn, rssi: -36
I (6723) wifi:pm start, type: 0

esp_event_post: Event: base=WIFI_EVENT id=4 data=0x3ffe5840 data_size=44 ticks=4294967295
I (6724) wifi:AP's beacon interval = 102400 us, DTIM period = 1
esp_evt_work_cb: WiFi sta connect
donghengqaz commented 3 years ago

But in esp-idf, the Wi-Fi adapter function esp_log_writev and esp_log_write should be atomic operation, they have mutex in side. so if syslog is also atomic operation, it is also OK.

protobits commented 3 years ago

You're not seeing the issue since you don't have any syslog prefix being added. Please enable timestamp prefix and you will see it. In any case, the letter + number is meaningless to users so it should be removed anyway.

Mar 31, 2021 08:57:18 dongheng @.***>:

I enable RAMLOG and INTBUFFER, but don't catch the same issue with you, as I know the log “wifi:state: xxx -> xxx” is in one task, so they will not mix each other, following is my log:

NuttShell (NSH) NuttX-10.0.1 esp32_rng_initialize: Initializing RNG I (21) wifi:wifi driver task: 4, prio:253, stack:6656, core=0 I (99) wifi:wifi firmware version: b9e8721 I (100) wifi:wifi certification version: v7.0 I (100) wifi:config NVS flash: enabled I (100) wifi:config nano formating: disabled I (100) wifi:Init data frame dynamic rx buffer num: 32 I (100) wifi:Init management frame dynamic rx buffer num: 32 I (100) wifi:Init management short buffer num: 32 I (100) wifi:Init dynamic tx buffer num: 32 I (101) wifi:Init static rx buffer size: 1600 I (101) wifi:Init static rx buffer num: 10 I (101) wifi:Init dynamic rx buffer num: 32 esp_wifi_adapter_init: OK to initialize WiFi adapter phy_version: 4500, 0cd6843, Sep 17 2020, 15:37:07, 0, 2 wifi_set_intr: cpu_no=0, intr_source=0, intr_num=0, intr_prio=1 esp_set_isr: n=0 f=0x40081f74 arg=0 irq=4 0x40081f74: wDev_ProcessFiq at ??:?

esp32_ints_on: INFO mask=00000001 irq=0 I (355) wifi:mode : sta (c4:4f:33:24:66:45) I (355) wifi:enable tsf esp_event_post: Event: base=WIFI_EVENT id=2 data=0 data_size=0 ticks=4294967295 esp_wifi_sta_start: OK to start WiFi station esp_evt_work_cb: WiFi sta start I (357) wifi:Set ps type: 0

esp_wifi_sta_password: WiFi station password=espressif002 len=12 esp_wifi_sta_essid: WiFi station ssid=AFAST_IK len=8 I (6391) wifi:new:<13,2>, old:<1,0>, ap:<255,255>, sta:<13,2>, prof:1 I (6392) wifi:state: init -> auth (b0) I (6397) wifi:state: auth -> assoc (0) I (6406) wifi:state: assoc -> run (10) W (6691) wifi:idx:0 (ifx:0, 8c:ab:8e:bb:82:08), tid:0, ssn:2, winSize:64 I (6693) wifi:connected with AFAST_IK, aid = 1, channel 13, 40D, bssid = 8c:ab:8e:bb:82:08 I (6693) wifi:security: WPA2-PSK, phy: bgn, rssi: -36 I (6723) wifi:pm start, type: 0

esp_event_post: Event: base=WIFI_EVENT id=4 data=0x3ffe5840 data_size=44 ticks=4294967295 I (6724) wifi:AP's beacon interval = 102400 us, DTIM period = 1 esp_evt_work_cb: WiFi sta connect

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub[https://github.com/apache/incubator-nuttx/issues/3157#issuecomment-811006715], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AABHPKVRMU3BHSRY3U2QWMTTGMEZZANCNFSM4ZVZ24YA]. [###24x24:true###][Imagen de rastreo][https://github.com/notifications/beacon/AABHPKTLXGAFLPDYXFTFVJDTGMEZZA5CNFSM4ZVZ24YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGBLPV6Y.gif]

donghengqaz commented 3 years ago

@v01d Thank you for your detailed explanation, I enable timestamp and catch the same log. We add other note to check this issue, the main reason is that: like log I(123) wifi:pm start, type 1, it contains of 3 part:

  1. I(123) wifi:, this log is outoput by _log_write
  2. pm start, type 1, this log is outoput by _log_writev
  3. \n, this log is outoput by _log_write

So when adding timestamp prefix, you see the log became: [ 0.080000] wifi: I (123) wifi:[ 0.080000] pm start, type 1[ 0.080000] wifi:.

Firstly I will report this issue to internal Wi-Fi team and advice them to combine the 3 parts into 1, but it comes a little later. So secondly I will try to remove part 1 & 3 from Wi-Fi adapter driver in NuttX.

donghengqaz commented 3 years ago

@v01d I add PR: https://github.com/apache/incubator-nuttx/pull/3267 to try to fix this issue in NuttX side, PTAL.

protobits commented 3 years ago

As I mentioned on #3267, I don't think that this should be addressed on NuttX driver side, but on the internal WiFi driver. Since this is not critical, we can wait until the internal WiFi team can address it and leave this issue open in the meantime.

yamt commented 3 years ago

i think the fundamental problem is that the expected behavior of these callbacks are not well documented. i guess the only thing we can do without having too many assumptions is to have a global line buffer and only flush it to nuttx syslog on a newline.

protobits commented 3 years ago

If we knew that the "important" message is emitted only in log_writev I would simply ignore the other two and add a newline. @donghengqaz can we assume this? This way we can also drop the unneeded prefix.

Otherwise I think that using an internal line buffer is the only solution if these calls will always be made separately to build the message.

Related to this discussion is the fact that I think syslog() should probably handle the newline adding on its own, as per the opengroup specification suggests (last phrase):

The message body is generated from the message and following arguments in the same manner as if these were arguments to printf(), except that the additional conversion specification %m shall be recognized; it shall convert no arguments, shall cause the output of the error message string associated with the value of errno on entry to syslog(), and may be mixed with argument specifications of the "%n$" form. If a complete conversion specification with the m conversion specifier character is not just %m, the behavior is undefined. A trailing may be added if needed.

Linux has this in its manpage:

The remaining arguments are a format, as in printf(3), and any arguments required by the format, except that the two-character sequence %m will be replaced by the error message string strerror(errno). The format string need not include a terminating newline character.

And this is BSD:

The message is identical to a printf(3) format string, except that `%m' is replaced by the current error message. (As denoted by the global variable errno; see strerror(3).) A trailing newline is added if none is present. Newlines and other non-printable characters embedded in the message string are printed in an alternate format. This prevents someone from using non-printable characters to construct misleading log messages in an output file. Newlines are printed as "\n", tabs are printed as "\t". Other control characters are printed using a caret ("^") representation, for example "^M" for carriage return.

I think we should:

  1. make syslog append newline if not present in message
  2. replace other newlines with a printable character
  3. Remove all use of "\n" inside syslog calls

This way we can avoid misuse of syslog()

patacongo commented 3 years ago

Otherwise I think that using an internal line buffer is the only solution if these calls will always be made separately to build the message.

Is this relevant? It enables use of a IOB to parse syslog lines into, then dumps the buffered line all at once.

 77 config SYSLOG_BUFFER
 78         bool "Use buffered output"
 79         default n
 80         depends on SYSLOG_WRITE
 81         select MM_IOB
 82         ---help---
 83                 Enables an buffering logic that will be used to serialize debug
 84                 output from concurrent tasks. This enables allocation of one buffer
 85                 per thread, each of size CONFIG_IOB_BUFSIZE.
 86
 87                 The use of SYSLOG buffering is optional.  If not enabled, however,
 88                 then the output from multiple tasks that attempt to generate SYSLOG
 89                 output may be interleaved and difficult to read.
protobits commented 3 years ago

That is useful to avoid interleaved prints from concurrent tasks. The point is that a call to syslog corresponds to a complete message so multiple calls should not be used to construct one output line. The standard description and BSD/Linux implementations confirm that one call to syslog implies one output line (no more, no less).

yamt commented 3 years ago

I think we should:

1. make syslog append newline if not present in message

2. replace other newlines with a printable character

3. Remove all use of "\n" inside syslog calls

This way we can avoid misuse of syslog()

i agree it makes sense to make syslog to append a newline when forwarding the message to the console. i'm not sure about the "if not present in message" part though. it's simpler to do it unconditionally.

we need manual investigation anyway because some code seems to assume the current behavior. eg. https://github.com/apache/incubator-nuttx/blob/83513e26cf92492484a369d768154a519256d33e/video/videomode/videomode_dump.c#L96-L108

protobits commented 3 years ago

I think supporting an existing "\n" could add compatibility but I'm not sure if that would be useful in practice. I agree that it is better to ensure one only correct way to use syslog(). BTW, let's continue this part of the discussion in #3291 and leave this to deal with the ESP32 issue.

protobits commented 3 years ago

If we knew that the "important" message is emitted only in log_writev I would simply ignore the other two and add a newline. @donghengqaz can we assume this? This way we can also drop the unneeded prefix.

@donghengqaz do you think we can assume this? I think that we will need to have an internal buffer as previously mentioned, but I would still like to see if we can drop this prefix.