apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.91k stars 1.18k forks source link

esp32s3 : float calc error or stack error #12490

Open w2016561536 opened 5 months ago

w2016561536 commented 5 months ago

Hi, I was running PX4 based on NuttX on esp32s3 and found an error. a float data will be nan after a simple multiplication. nan_float And the console output: nan_console This problem will appear after booting for several minutes. Moreover, some function calling will make the same thing, making the float varible Nan.

defconfig:

CONFIG_ALLOW_BSD_COMPONENTS=y
CONFIG_ALLOW_GPL_COMPONENTS=y
CONFIG_ALLOW_MIT_COMPONENTS=y
CONFIG_ALLOW_ECLIPSE_COMPONENTS=y
CONFIG_ALLOW_ICS_COMPONENTS=y
CONFIG_BASE_DEFCONFIG="-dirty"
CONFIG_INTELHEX_BINARY=y
CONFIG_ARCH_SETJMP_H=y
# CONFIG_NDEBUG is not set
CONFIG_STACK_COLORATION=y
CONFIG_CCACHE=y
CONFIG_ARCH_XTENSA=y
CONFIG_PWM_MULTICHAN=y
CONFIG_PWM_NCHANNELS=8
CONFIG_ARCH_CHIP_ESP32S3=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_DATA_CACHE_64KB=y
CONFIG_ESP32S3_DATA_CACHE_LINE_64B=y
CONFIG_ESP32S3_SPI2=y
CONFIG_ESP32S3_SPI3=y
CONFIG_ESP32S3_UART0=y
CONFIG_ESP32S3_UART1=y
CONFIG_ESP32S3_UART2=y
CONFIG_ESP32S3_WIFI=y
CONFIG_ESP32S3_I2C0=y
CONFIG_ESP32S3_I2C1=y
CONFIG_ESP32S3_LEDC=y
CONFIG_ESP32S3_USBSERIAL=y
CONFIG_ESP32S3_GPIO_IRQ=y
CONFIG_ESP32S3_SPI_SWCS=y
CONFIG_ESP32S3_SPI_UDCS=y
CONFIG_ESP32S3_SPI_DMA=y
CONFIG_ESP32S3_SPI_DMA_BUFSIZE=4092
CONFIG_ESP32S3_SPI_DMATHRESHOLD=4
CONFIG_ESP32S3_SPI2_CSPIN=1
CONFIG_ESP32S3_SPI2_CLKPIN=2
CONFIG_ESP32S3_SPI2_MOSIPIN=42
CONFIG_ESP32S3_SPI2_MISOPIN=41
CONFIG_ESP32S3_SPI3_CSPIN=35
CONFIG_ESP32S3_SPI3_CLKPIN=39
CONFIG_ESP32S3_SPI3_MOSIPIN=38
CONFIG_ESP32S3_SPI3_MISOPIN=36
CONFIG_ESP32S3_UART2_TXPIN=8
CONFIG_ESP32S3_UART2_RXPIN=3
CONFIG_ESP32S3_I2C0_SCLPIN=45
CONFIG_ESP32S3_I2C0_SDAPIN=48
CONFIG_ESP32S3_I2C1_SCLPIN=15
CONFIG_ESP32S3_I2C1_SDAPIN=16
CONFIG_ESP32S3_I2CTIMEOMS=10
CONFIG_WPA_WAPI_PSK=y
CONFIG_ESP32S3_WIFI_STATION_SOFTAP=y
CONFIG_ESP32S3_WIFI_STATIC_RXBUF_NUM=16
CONFIG_ESP32S3_WIFI_DYNAMIC_RXBUF_NUM=64
CONFIG_ESP32S3_WIFI_DYNAMIC_TXBUF_NUM=64
CONFIG_ESP32S3_WIFI_RXBA_AMPDU_WZ=16
CONFIG_ESP32S3_FLASH_MODE_QIO=y
CONFIG_ESP32S3_FLASH_FREQ_120M=y
CONFIG_ESP32S3_LEDC_TIM0=y
CONFIG_ESP32S3_LEDC_TIM0_CHANNELS=4
CONFIG_ESP32S3_LEDC_CHANNEL0_PIN=10
CONFIG_ESP32S3_LEDC_CHANNEL1_PIN=9
CONFIG_ESP32S3_LEDC_CHANNEL3_PIN=37
CONFIG_ESP32S3_LEDC_CHANNEL4_PIN=13
CONFIG_ESP32S3_LEDC_CHANNEL5_PIN=14
CONFIG_ESP32S3_LEDC_CHANNEL6_PIN=21
CONFIG_ESP32S3_LEDC_CHANNEL7_PIN=47
CONFIG_BOARD_LOOPSPERMSEC=16717
CONFIG_ARCH_INTERRUPTSTACK=2048
CONFIG_RAM_START=0x20000000
CONFIG_RAM_SIZE=114688
CONFIG_ARCH_BOARD_CUSTOM_NAME="px4"
CONFIG_ARCH_BOARD_CUSTOM_DIR="../../../../boards/px4/esp32s3/nuttx-config"
CONFIG_ARCH_BOARD_COMMON=y
CONFIG_ESP32S3_SPEED_UP_ISR=y
CONFIG_BOARDCTL_RESET=y
CONFIG_USEC_PER_TICK=1000
CONFIG_START_YEAR=2011
CONFIG_START_MONTH=12
CONFIG_START_DAY=6
CONFIG_PREALLOC_TIMERS=4
CONFIG_SPINLOCK=y
CONFIG_INIT_STACKSIZE=8192
CONFIG_INIT_ENTRYPOINT="nsh_main"
CONFIG_TASK_NAME_SIZE=48
CONFIG_SCHED_WAITPID=y
CONFIG_PTHREAD_MUTEX_TYPES=y
CONFIG_SCHED_INSTRUMENTATION=y
CONFIG_SCHED_INSTRUMENTATION_SWITCH=y
CONFIG_NAME_MAX=48
CONFIG_SIG_DEFAULT=y
CONFIG_PREALLOC_MQ_MSGS=64
CONFIG_SCHED_HPWORK=y
CONFIG_SCHED_HPWORKSTACKSIZE=2048
CONFIG_SCHED_LPWORK=y
CONFIG_SCHED_LPWORKSTACKSIZE=2048
CONFIG_DEFAULT_TASK_STACKSIZE=4096
CONFIG_IDLETHREAD_STACKSIZE=3072
CONFIG_PTHREAD_STACK_MIN=2048
CONFIG_I2C_RESET=y
CONFIG_I2C_DRIVER=y
CONFIG_SPI_DRIVER=y
CONFIG_TIMER=y
CONFIG_DEV_GPIO=y
CONFIG_DEV_ZERO=y
CONFIG_DEV_ASCII=y
CONFIG_MTD=y
CONFIG_MTD_PARTITION=y
CONFIG_MTD_PARTITION_NAMES=y
CONFIG_MTD_BYTE_WRITE=y
CONFIG_MTD_CONFIG=y
CONFIG_MTD_RAMTRON=y
CONFIG_RAMTRON_SETSPEED=y
CONFIG_PIPES=y
CONFIG_DEV_PIPE_MAXSIZE=1024
CONFIG_DEV_PIPE_SIZE=70
CONFIG_SERIAL_NPOLLWAITERS=6
CONFIG_SERIAL_TERMIOS=y
CONFIG_UART0_RXBUFSIZE=128
CONFIG_UART0_TXBUFSIZE=128
CONFIG_UART1_RXBUFSIZE=128
CONFIG_UART1_TXBUFSIZE=128
CONFIG_UART2_RXBUFSIZE=128
CONFIG_UART2_TXBUFSIZE=128
CONFIG_DRIVERS_WIRELESS=y
CONFIG_DRIVERS_IEEE80211=y
CONFIG_SYSLOG_BUFFER=y
CONFIG_SYSLOG_BUFSIZE=256
CONFIG_SYSLOG_DEVPATH=""
CONFIG_DMA=y
CONFIG_NET_ETH_PKTSIZE=1514
CONFIG_NETDEV_LATEINIT=y
CONFIG_NETDEV_PHY_IOCTL=y
CONFIG_NETDEV_WIRELESS_IOCTL=y
CONFIG_NET_BINDTODEVICE=y
CONFIG_NET_TCP=y
CONFIG_NET_TCP_DELAYED_ACK=y
CONFIG_NET_TCP_WRITE_BUFFERS=y
CONFIG_NET_UDP=y
CONFIG_NET_BROADCAST=y
CONFIG_NET_UDP_WRITE_BUFFERS=y
CONFIG_NET_ICMP=y
CONFIG_NET_ICMP_SOCKET=y
CONFIG_FS_LARGEFILE=y
CONFIG_FS_FAT=y
CONFIG_FAT_COMPUTE_FSINFO=y
CONFIG_FS_FATTIME=y
CONFIG_FS_ROMFS=y
CONFIG_FS_CROMFS=y
CONFIG_FS_SMARTFS=y
CONFIG_FS_BINFS=y
CONFIG_FS_PROCFS=y
CONFIG_FS_PROCFS_REGISTER=y
CONFIG_MM_REGIONS=2
CONFIG_IOB_NBUFFERS=124
CONFIG_IOB_THROTTLE=24
CONFIG_WIRELESS=y
CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
CONFIG_TLS_NELEM=4
CONFIG_TLS_TASK_NELEM=4
CONFIG_NETDB_DNSCLIENT=y
CONFIG_BUILTIN=y
CONFIG_HAVE_CXX=y
CONFIG_HAVE_CXXINITIALIZE=y
CONFIG_BENCHMARK_COREMARK=y
CONFIG_EXAMPLES_DHCPD=y
CONFIG_NETUTILS_DHCPD=y
CONFIG_NETUTILS_DHCPD_STACKSIZE=2048
CONFIG_NETINIT_WAPI_SSID="MY_PX4"
CONFIG_NETINIT_WAPI_PASSPHRASE="12345678"
CONFIG_NSH_LINELEN=128
CONFIG_NSH_MAXARGUMENTS=15
CONFIG_NSH_NESTDEPTH=8
CONFIG_NSH_BUILTIN_APPS=y
# CONFIG_NSH_CMDOPT_HEXDUMP is not set
CONFIG_NSH_FILEIOSIZE=512
CONFIG_NSH_ROMFSETC=y
CONFIG_NSH_CROMFSETC=y
CONFIG_NSH_ROMFSSECTSIZE=128
CONFIG_NSH_ARCHINIT=y
CONFIG_SYSTEM_ARGTABLE3=y
CONFIG_SYSTEM_NSH=y
CONFIG_WIRELESS_WAPI=y
CONFIG_WIRELESS_WAPI_CMDTOOL=y
CONFIG_WIRELESS_WAPI_STACKSIZE=8192

Hardware: ESP32S3-WROOM-1 M0N16R8

NuttX version: 12.4 , commit : 0f169f50c4b234abde12a6a0b028a8fe8f62f5aa

Full source code: https://1drv.ms/u/c/008ed313fdaa343c/EaXGLgJs_3VLpahnyyVtaL4BgF3pUIa_6f1XHX_ZxOb-Ow?e=1iSk8w

w2016561536 commented 5 months ago

GCC toolchain: https://github.com/espressif/crosstool-NG/releases/tag/esp-13.2.0_20240530

acassis commented 5 months ago

Hi @w2016561536 thank you for reporting the issue. Is there some way to reproduce this issue just creating a simple test on NuttX mainline, without using all these IMU files from PX4? If you can isolate the issue it will help us to find the root cause. Just for awareness ping @tmedicci

w2016561536 commented 5 months ago

Well, it seems to be difficult to reproduce. But I perhaps find the problem. File https://github.com/apache/nuttx/blob/b09b429308b991ba455cad57b53e0abaa423bf53/arch/xtensa/src/common/xtensa_user_handler.S#L363C1-L363C23, we can find that this function does not correctly implemented. In FreeRTOS, implemention is https://github.com/espressif/esp-idf/blob/cadf80e8751caffaf25207a12bb65e5b188683ae/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S#L990. And this funtion has a related issue: https://github.com/espressif/esp-idf/issues/11690 , very similar to this issue

w2016561536 commented 5 months ago

@tmedicci Do you think this problem is caused by fpu ?

tmedicci commented 5 months ago

@tmedicci Do you think this problem is caused by fpu ?

Hi @w2016561536, I am not aware of it. Maybe, you could try to implement the workaround and I can evaluate using our internal CI.

acassis commented 5 months ago

@w2016561536 did you try to save the FP registers?

If it fixes the issue we should include it into mainline. Maybe wrapped by #ifdef CONFIG_ARCH_FPU

ProfFan commented 1 month ago

Hey guys, I am the reporter of the original problem in ESP-IDF FreeRTOS. Yes, this is a silent data corruption and the current ESP-IDF's interrupt vector assembly file has a fix. Regardless of whether this specific issue is caused by the same bug (very likely), you should update the vectors to match upstream :)

xiaoxiang781216 commented 1 month ago

@ProfFan could you point out the patch? we can apply the change, thanks.

yamt commented 1 month ago

i guess he meant this one. https://github.com/espressif/esp-idf/issues/11690

xiaoxiang781216 commented 1 month ago

but the change look like FreeRTOS specific: https://github.com/espressif/esp-idf/commit/b03c8912c73fa59061d97a2f5fd5acddcc3fa356#diff-db429b5abb80b87b6da1abb1ecd103c81fc2d982780bd8a3f1a23494b1749155R1152

w2016561536 commented 1 month ago

Perhaps this bug needs Espressif staff to work on.

acassis commented 1 month ago

@fdcavalcanti @eren-terzioglu @tmedicci please take a look ^

pkarashchenko commented 1 month ago

The FPU vs non-FPU can be checked by disabling the FPU and trying if the issue will be reproduced with integer emulated math libs

tmedicci commented 1 month ago

I'm sorry, @xiaoxiang781216 , the issue ID that https://github.com/apache/nuttx/pull/14481 solves is different. I already fixed it. I'm sorry.

acassis commented 1 month ago

@w2016561536 maybe we can work together to get PX4 working on ESP32, ESP32-S2 and ESP32-S3. @henrykotze is working on PX4 for ESP32 and I want to run NuttX on ESP32-S2 to run on this device:

https://aliexpress.com/item/1005006845550308.html

w2016561536 commented 1 month ago

@w2016561536 maybe we can work together to get PX4 working on ESP32, ESP32-S2 and ESP32-S3. @henrykotze is working on PX4 for ESP32 and I want to run NuttX on ESP32-S2 to run on this device:

https://aliexpress.com/item/1005006845550308.html

Good idea! But I think fpu is necessary for this complex task, however esp32-s2 doesn't have. I have tried to port PX4 for esp32s3 and uploaded to https://github.com/w2016561536/PX4-Autopilot/tree/px4_esp32s3 And here, Guanglun has finished PX4 for esp32 https://github.com/guanglun/PX4-Autopilot/tree/single_core_esp32

w2016561536 commented 4 weeks ago

Perhaps this thing leads to fpu problem? https://github.com/apache/nuttx/blob/0c5381a0a15f992d8d0cdca9e9c6ac6682176f42/arch/xtensa/src/esp32s3/esp32s3_i2c.c#L1379 Too many tasks in irq stack. I have tried to use i2c poll mode instead and this problem seems to disappear.