espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
689 stars 159 forks source link

mcu hosted performance #72

Open xiongyu0523 opened 2 years ago

xiongyu0523 commented 2 years ago

Hi,

Is there a plan to optimize performance on the MCU host? I have ported esp-hosted to Azure RTOS NetX DUO based on your reference implementation but performance is extremely low (1Mbps). It makes no sense if we can't improve the performance with this middleware.

  1. boost SPI frequency and use of DMA, minimize data transfer time (the gap between bytes) and CPU loading on transfer
  2. full async operation to queue multiple transfers, TCPIP stack (CPU) is always on waiting for more buffer to be released and start a new transfer
drorgl commented 2 years ago

I've also experienced non-ideal performance with lwip. Except for the amount of malloc/free it seems also that many transfers are not used to actually transfer data. most likely the way I've got lwip configured.

mantriyogesh commented 2 years ago

Hello @drorgl,

Appreciate all your efforts for integrating lwip. We will look into the throughput issue and get back to you.

mantriyogesh commented 2 years ago

Hello @drorgl ,

We checked your esphosted code porting and appreciate all your efforts. Although I have some questions in priority,

  1. Delay_us(10); -> This is introduced every transaction. Is there a way we can avoid delay every transaction? If this code is present, irrespective of SPI clock this part will low the overall driver
  2. Are you using handshake pin to do next spi transfer? If this is done, we do not need delay in point 1 above. Every handshake pin interrupt, semaphore should be posted and next transaction gets green signal
  3. Similar to handshake pin, I hope you have to work with Data Ready pin as well.
  4. I am not too sure about, but is it possible to use lwip's pbuf instead of working around as esp_pbuf. If so, the integration with lwip should be seamless.
  5. netdev_stub.c should be really be removed and all network calls are to be replaced by lwip's calls/apis.

Please do let us know any inputs or any questions in this regard.

drorgl commented 2 years ago

Hi @mantriyogesh ,

  1. it was necessary, otherwise the ESP would send garbage, but the code has changed since so perhaps its not needed anymore.
  2. The code for handling the spi is the same as esp-hosted MCU demo, so yes.
  3. as far as I know in the demo code both pins release the same semaphore, I wasn't sure why you did that in the esp-hosted MCU demo instead of just using one pin and I didn't investigate it further.
  4. its possible as far as I understand, I doubt the memcpy has that much impact on the performance with the MCU I'm using.
  5. I agree, again, it was a POC, definitely not production code.

If you're planning to investigate and improve the MCU demo in esp-hosted, I propose the following changes to esp-hosted:

  1. get rid of the huge number of malloc/free and use either a memory pool or LWIP zero copy by using the lwip pbuf (which is a memory pool), I've clocked the number of allocations at 12K for iperf -c (30 seconds), not sure anyone with embedded application will want so much dynamic memory usage in their production code, it did raise an issue with the default memory allocator that comes with STM32Cube (hard faults) so perhaps there is another problem hiding.
  2. use DMA for the SPI transfers, while STM32 NSS in hardware is behaving differently than the ESP32, its possible to achieve the same functionality with software, just assert the NSS before the DMA request and deassert it in the transfer complete interrupt. Please note that I've added a dummy SPI transfer in SPI initialization as initializing the SPI transfer after the ESP32 reset showed unwanted pulses in the logic analyzer.

I'm using the nucleo 446 which has only 128K RAM, in a different MCU the lwip can be optimized to use a larger window and packet size so it might be faster, a partially successful UDP test showed 2Mbps but there were SPI transfer errors which I didn't investigate.

I fully agree with @xiongyu0523 's suggestions. one of them is that the code also needs an OS abstraction layer.

I've also simplified the build for the ESP32 hosted project using platformio, if you're interested in that I can push my fork.

mantriyogesh commented 2 years ago

Sure @drorgl. Thanks @xiongyu0523, @drorgl.

All your findings hold important value.

  1. Malloc and free with zero copy might be very optimised way to do the things. Which I believe users while porting should definitely do, which will save the good CPU cycles. In addition, we will shortly port it to some network driver, it may or may not be lwip, but we will try to incorporate your points there.

  2. If you observe the we have two STM32CubeMX ioc projects. One of them is with hardware NSS and one is without NSS (software controlled). We wish the NSS to be working to get driver time execution benefit. Software controlled NSS is basically Toggling the Chip select /NSS value before and after SPI transfer. a. DMA will be a good value addition. We will surely try to add support for this. In case you have already working code, we can accept as for of pull request. b. Regarding unwanted pulses you observed, This looks like a problem. Could you please point us the code and image log, we will try to investigate this further.

  3. OS abstraction layer - we thought of adding wrapper layer, which will save users time. but note that it may not necessarily CMSIS. The problem here is multiple RTOS may not respect same wrapper OS. This wrapper OS layer will help make code agnostic of OS underlying.

  4. Definitely please push the work you feel is relevant. It will definitely help users in similar problems. Additionally, if you have any contributions, directly into the code, we do accept the pull requests. This helps ESP-Hosted and its users to get acknowledgement for their open source work.

giorgiocolazzo91 commented 1 year ago

Hello @mantriyogesh, are there any updates on future releases for LWIP integration of the esp-host-fg solution? I have just seen the updates of 2 months ago where they were announced to be realesed soon

mantriyogesh commented 1 year ago

Not everything is done from above list. But many changes we had already added.

We have simplified the porting layer, added lwip integration etc. These changes gone in for ESP as slave with: STM32 as host and also ESP as host. Right now our focus is with ESP to ESP.

There is branch, https://github.com/espressif/esp-hosted/tree/feature/esp_as_mcu_host (do not get confused with another similar named branch name)

Some time back I had added changes in #186 .

Again there are many ongoing changes (under review), which I expect to be pushed by this month end.

giorgiocolazzo91 commented 1 year ago

Hi @mantriyogesh, thanks for the quick reply. I am more interested in the ESP as slave with an STM32 master. Is the master branch the most updated one? Thanks in advance, Regards

mantriyogesh commented 1 year ago

we have done some development for ST as MCU, if you want to take a look (can provide as is and of coarse, cannot raise bug against unless goes in master!!).

As said earlier, right now we are concentrating esp as mcu. But anyway this code will will come back in focus at later some point. by the time can have a look: h1_mcu_st_and_esp.tgz