apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.5k stars 1.06k forks source link

`df -h` not rendered correctly when LIBC_LONG_LONG is disabled #5253

Closed normanr closed 1 year ago

normanr commented 2 years ago

I get some very strange output for df -h on my bl602. As dh -h is just a wrapper around cat /proc/fs/usage the issue comes from usage_entry in fs_procfs_mount.c.

Unfortunately I can not reproduce this with the sim board (I don't remember if it uses the host libc for formatting or not).

Sample of invalid output:

nsh> df -h
  Filesystem    Size      Used  Available Mounted on
  romfs      587767712@        0B         0@ (null)
  procfs     587767712        0B         0 (null)
  tmpfs      587767712        0B         0 (null)

If I take the mount_sprintf at line 369 and split it into separate calls then I can get:

nsh> df -h
  Filesystem    Size      Used  Available Mounted on
  romfs         320       66�       320 /etc
  procfs          0       662         0 /proc
  tmpfs         512       66�         0 /tmp

or:

nsh> df -h
[  119.000000] irq_unexpected_isr: ERROR irq: 5
[  119.000000] up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 51 task: nsh_main

With only a single variable per call (and 8 calls to mount_sprintf) I get valid output:

nsh> df -h
  Filesystem    Size      Used  Available Mounted on
  romfs         320B      320B         0B /etc
  procfs          0B        0B         0B /proc
  tmpfs         512B        0B       512B /tmp

I suspect that this might be because usage_entry is passing an uint64_t to vsnprintf (because CONFIG_HAVE_LONG_LONG), but lib_vsprintf does not have CONFIG_LIBC_LONG_LONG defined, so it treats llu as a long, and doesn't iterate though the arguments correctly.

If I cast the arguments to unit32_t when they're passed to mount_sprintf then they render correctly (whether or not they're formatted as llu or lu.

I'm not sure if this bug is in fs_procfs_mount (should not be passing long long to vsnprintf if CONFIG_LIBC_LONG_LONG is not defined), or in lib_vsprintf (should compile fail?!? if CONFIG_LIBC_LONG_LONG is not defined and a long long is passed).

I guess what's happening with the sim board is that it's being compiled with the system gcc (11.2.0), and the bl602 target is being compiled with the SiFive gcc (8.3.0). They probably handle "incorrect" sizes of variadic function arguments differently.

normanr commented 2 years ago

I also tried SiFive's gcc version 10.2.0:

nsh> df -h
  Filesystem    Size      Used  Available Mounted on
  romfs      1107365336@        0B      9998@ (null)
  procfs     1107365336        0B      9998 (null)
  tmpfs      1107365336        0B      9998 (null)

so not something that's fixed by upgrading gcc, and weirdly different corruption (hex values are 0x23089fa0 vs 0x42010dd8).

ALTracer commented 2 years ago

I can confirm I see similar behaviour of nsh mount printing weird sizes on at least stm32f072b-disco:nsh (or tweaked) and a new custom armv6-m platform. I certainly did non enable LONG_LONG, because I have really tight ram constraints, and both MCUs are 32-bit, not requiring to survive y2038/time64_t etc. I might check it again with a live SWD debugger, considering nsh uses vsnprintf-family functions.

Is the overhead of multiple calls per mountpoint significant (code, stack, execution time)?

ALTracer commented 2 years ago

I marked the mount_sprintf() private function as printf-like (which is supported on GCC and Clang),

--- a/fs/mount/fs_procfs_mount.c
+++ b/fs/mount/fs_procfs_mount.c
@@ -167,6 +167,8 @@ const struct procfs_operations mount_procfsoperations =
  *
  ****************************************************************************/

+// __attribute__((format(printf, 2, 3)))
+printflike(2, 3)
 static void mount_sprintf(FAR struct mount_info_s *info,
                           FAR const char *fmt, ...)
 {

and my GCC 11.2 (arm-none-eabi) now emits an unrelated diagnostic for blocks_entry() (this issue is about usage_entry()):

altracer@glacier ~/g/a/incubator-nuttx (master)> grep LONG .config
# CONFIG_ARCH_SIZET_LONG is not set
CONFIG_LIBC_LONG_LONG=y
altracer@glacier ~/g/a/incubator-nuttx (master)> /usr/bin/time -v colormake
...
CC:  mount/fs_procfs_mount.c
mount/fs_procfs_mount.c: In function 'blocks_entry':
mount/fs_procfs_mount.c:260:27: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
  260 |   mount_sprintf(info, "%6lu %10" PRIuOFF " %10" PRIuOFF
      |                        ~~~^
      |                           |
      |                           long unsigned int
      |                        %6u
  261 |                 "  %10" PRIuOFF " %s\n",
  262 |                 statbuf->f_bsize, statbuf->f_blocks,
      |                 ~~~~~~~~~~~~~~~~
      |                        |
      |                        size_t {aka unsigned int}

Upon changing the size_t argument format specifier from %lu to %zu the diagnostic disappears.

--- a/fs/mount/fs_procfs_mount.c
+++ b/fs/mount/fs_procfs_mount.c
@@ -256,7 +257,7 @@ static int blocks_entry(FAR const char *mountpoint,

   /* Generate blocks list one line at a time */

-  mount_sprintf(info, "%6lu %10" PRIuOFF " %10" PRIuOFF
+  mount_sprintf(info, "%6zu %10" PRIuOFF " %10" PRIuOFF
                 "  %10" PRIuOFF " %s\n",
                 statbuf->f_bsize, statbuf->f_blocks,
                 statbuf->f_blocks - statbuf->f_bavail, statbuf->f_bavail,

There is no PRIu32/PRIu64 macro for zu size_t in include/inttypes.h .

The libc code in lib_libvsprintf.c is definitely above my level of understanding, but I guess there might be an overflow error in __ultoa_invert() or vsprintf_internal() itself. The error is only visible at run-time, I can't get to highlight it with static analyzers (Qt Creator libclang code model) or compiler diagnostics.

Also, I see a problem with conditionals: #ifdef CONFIG_HAVE_LONG_LONG resolves to 1 for GCC and SDCC, on which the variables' size depends in https://github.com/apache/incubator-nuttx/blob/master/fs/mount/fs_procfs_mount.c#L364 , but that doesn't guarantee the firmware builder has chosen to use CONFIG_LIBC_LONG_LONG in kconfig-menuconfig, which controls vsprintf logic here https://github.com/apache/incubator-nuttx/blob/master/libs/libc/stdio/lib_libvsprintf.c#L187

Note the commit https://github.com/apache/incubator-nuttx/commit/1d8fd9e034aee6366c918c1a639749bc0083df1a which extends sizes in usage_entry() from ul to ull apparently due to overflowing on actual storage, like >4GB microSD cards, and the commit https://github.com/apache/incubator-nuttx/commit/933e9e407b55cec357d3e7daf558a105d2c7507d which introduces support of size_t to NuttX libc vsprintf().

I propose two simple edits:

  1. Annotate the private mount_sprintf() somewhere (and possibly others, like procfs_snprintf() );
  2. Change CONFIG_HAVE_LONG_LONG to CONFIG_LIBC_LONG_LONG in three places inside usage_entry().

I can make a pull request like that after testing on a couple armv6-m / armv7-m boards. Upon further thinking, this function could be refactored to use size_t somehow.

Summoning @mlyszczek for any thoughts.

@normanr Can you verify df -h behaviour on your bl602 board with some large FS partitions?

normanr commented 2 years ago

The bl602 doesn't have large partitions (it's a tiny micro-controller). The issue comes from the way that the arguments are passed as variadic arguments. The calling site passes them as long long, but sprintf only consumes them (with va_arg) as long. As there are multiple arguments being passed to vsnprintf they get "out of alignment" and the address of the labels end up being used for the number values.

If I cast the values to unit32_t before calling mount_sprintf then it works correctly (but I assume any larger than uint32_t value would be truncated and not be rendered correctly). Another solution would be to make the conditionals in usage_entry based on CONFIG_LIBC_LONG_LONG instead of CONFIG_HAVE_LONG_LONG. This makes it hard to detect the overflow though. So storing as long long, and casting to long, and checking that it didn't truncate would be the better.

It would be nice if vsprintf could consume the long long even if CONFIG_LIBC_LONG_LONG wasn't defined. One way would be to (assuming the format specifier was for a long long) consume a long, check that it's zero (debug assert if not), and then consume the remaining long? (all assuming that a long long can be consumed as two long, which entirely depends on how va_arg is implemented by the compiler).

mlyszczek commented 2 years ago

In libvsprintf.c we have

#ifndef CONFIG_HAVE_LONG_LONG
#  undef CONFIG_LIBC_LONG_LONG
#endif

So it's safe to assume if LIBC_LONG_LONG is defined so is HAVE_LONG_LONG. I think it's both safe and logical to wrap sprintf(3) function in LIBC_LONG_LONG. Something like that seems to be fine:

#ifdef CONFIG_LIB_LONG_LONG
  mount_sprintf(info, "  %-10s %6llu%c %8llu%c  %8llu%c %s\n", fstype,
                size, sizelabel, used, usedlabel, free, freelabel,
                mountpoint);
#else
  mount_sprintf(info, 
#  if CONFIG_HAVE_LONG_LONG
  "long long is enabled, but libc does not support it, printed data may be truncated\n"
#  endif
  "  %-10s %6ld%c %8ld%c  %8ld%c %s\n", fstype,
                (uint32_t)size, sizelabel, (uint32_t)used, usedlabel, (uint32_t)free, freelabel,
                mountpoint);
#endif

It still would make 4gb sd cards to not be properly printed, but at least programmer would have warning about it so he could react properly if needed.

I wouldn't really wrap type declaration with LIBC_LONG_LONG - it looks illogical, and that define was not created for that purpose.

As @normanr said, it's not really possible to determine overflow inside of vsprintf(3) via va_arg(3). va_arg(3) requires you to know what type you are taking from the argument stack.

Consuming long long without LIBC_LONG_LONG kinda defies that flag I think, because you would have to have code to parse long long. And it would be even harder to consume long long without HAVE_LONG_LONG - long long must be at least 64bits long, not exactly, so without having a way to probe compile for sizeof(long long) it's impossible to implement it in portable way.

ALTracer commented 2 years ago

@normanr

Another solution would be to make the conditionals in usage_entry based on CONFIG_LIBC_LONG_LONG instead of CONFIG_HAVE_LONG_LONG. This makes it hard to detect the overflow though. So storing as long long, and casting to long, and checking that it didn't truncate would be the better.

usage_entry does not ever need to print long long, because the logic inside converts to megabytes and gigabytes. It shouldn't overflow even if you attach some 2TiB HDDs to an embedded device somehow with disabled both FS_LARGEFILE and LIBC_LONG_LONG, which is certainly a foreseeable situation. usage_entry only ever allocates uint64_t for internal calculations and uses bitshifts instead of division. Even then, I think, blocks_entry overflows first. I added logic for both of that in my PR.

@mlyszczek Thank you for feedback, the problem is solvable in a slightly different way.

ALTracer commented 2 years ago

A note related to code size versus long long division.

If a platform has LL support (via gcc builtins), it may very well be useful just for add/sub and multiplication, and we can mostly avoid LL division by carefully writing OS code. I revisited some of my TIMER drivers and introduced a couple changes to make them work without uint64_t.

The only hard consumers of uldivmod/uidivmod and friends in NuttX turned out to be code related to clock, timestamps, irq, nxsig and files. I analyzed that with Puncover on resulting nuttx.elf with -ggdb3.

I measured code sizes on my Milandr armv6-m (48k SRAM, 128k Flash) configuration (nsh, modbus slave, timer demo, ramspeed) with -Og and OPT_UNUSED_SECTIONS on release/10.3 -based branch while flipping some flags:

  1. #undef CONFIG_HAVE_LONG_LONG and disable LIBC_LONG_LONG, FS_LARGEFILE: -1416 bytes (!);
    text    data     bss     dec     hex filename
    98540     652    2952  102144   18f00 nuttx
  2. #define CONFIG_HAVE_LONG_LONG 1 (stock) and disable LIBC_LONG_LONG, FS_LARGEFILE: baseline (what I intended to go with)
    text    data     bss     dec     hex filename
    99956     652    2952  103560   19488 nuttx
  3. Enable only LIBC_LONG_LONG: +164 bytes;
    text    data     bss     dec     hex filename
    100120     652    2952  103724   1952c nuttx
  4. Enable only FS_LARGEFILE: +400 bytes;
    text    data     bss     dec     hex filename
    100356     652    2952  103960   19618 nuttx
  5. Enable both LIBC_LONG_LONG and FS_LARGEFILE: +592 bytes.
    text    data     bss     dec     hex filename
    100548     652    2952  104152   196d8 nuttx

    In total +2008 bytes to link with long long division and related codepath changes. I can live with divsi3 and udivsi3, which are required to have plain long division.

xiaoxiang781216 commented 1 year ago

Thanks for the detailed analysis, uldivmod/uidivmod not only consume more flash space, but also execute very slow. clock/timestamp is called so frequently, we have found the performance lost is noticeable in the real product, it's better to replace 64bit mul/div with 32bit counterpart for both size and performance.