espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
675 stars 157 forks source link

The function ble_hci_trans_hs_acl_tx will split the data packet into multiple packets for sending #388

Open linchanghe123 opened 4 months ago

linchanghe123 commented 4 months ago

Hi mantriyogesh,

In fg mode, ESP32c3 calls the following function to send 200 bytes of data. By capturing packets through the air port, it is found that the data will be split into multiple packets for transmission. MTU is greater than 200 bytes. May I ask how to avoid subcontracting as much as possible to improve transmission efficiency.

8f9f6a08b6b7ce7be7c84d9ed9cb0fce

image

mantriyogesh commented 4 months ago

Hello @linchanghe123 ,

We do not restrict the Bluetooth packet on such a less MTU.

For C3, SPI , max payload size for any bluetooth packet (Hosted perspective) is 1600-12 (sizeof header) = 1588 bytes.

Please check cross-check the MTU settings handled at higher layer (BlueZ) hciconfig may list the current MTU available.

mantriyogesh commented 4 months ago

I am still checking if any default/config/limitation from controller side, Will get back to you on this.

mantriyogesh commented 4 months ago

MTU setting and handshake is function of stack. BlueZ has following defaults (517)

tmp_8e82d989-64f7-450b-ae0b-3ca7ed433bd3

You might want to check either side MTU settings, as it would be negotiated and smaller would be set. Either party involving would have changed their MTU value to 200.

linchanghe123 commented 4 months ago

We used RTOS and cannot directly use the hciconfig command, but from the MTU negotiation message, the MTU is 256. Okay, let's check if there are any restrictions on the protocol stack. image

mantriyogesh commented 4 months ago

Oh okay. Yes, I think the stack config should be able to provide either as config or as function, which you can have look at.

For example in case of ESP chipsets, where the NimBLE as host stack is running, we provide CONFIG_BT_NIMBLE_ATT_PREFERRED_MTU stack config.

linchanghe123 commented 4 months ago

We are using the nimble protocol stack, and the MTU configuration is also 256,. From the negotiation message, it is indeed 256. We found that esp32c3 called the esp_vhci_host_send_packet interface and ultimately split the data. When calling this interface, the length given was 236. But the empty packet sent out was split up.

mantriyogesh commented 4 months ago

For this use case, function in effect would be: https://github.com/espressif/esp-hosted/blob/2eb1fff5b7b18af20087ace3a35ba596172acdd5/esp_hosted_fg/esp/esp_driver/network_adapter/main/slave_bt.c#L426-L447

Where there is no split as such (Unless done at lower nimble stack)

Anyway, can you please provide the textual logs of:

  1. ESP full log from bootup
  2. sudo btmon (or similar log say wireshark capture, which would let us know the message exchange including MTU exchange and split) possibly at both host sides
  3. Mapping higher level Host log dumping paclet
  4. This is host side tx msg. But the image shows it is slave tx. Can you please send us full wireshark log as well?
linchanghe123 commented 4 months ago

ESP full log: image image

wireshark log: log out smlog.zip

mantriyogesh commented 4 months ago

From wireshark, I see the packets fragment from slave to host. Packet # 10699, 10701/703/705/708/710/712/714/716

This packet is from slave to host. Meaning, controller to host (NimBLE).

Despite having sufficient MTU, why the packet received at controller has fragments, is the question.

esp_vhci_host_send_packet() is not correct function. It is host Tx.

This is not expected direction. Direction in which this function works: BT stack -> hosted host -> hosted slave -> controller it is called when ESP_HCI_IF packet received from SPI (from host stack), we push it to controller. process_rx_pkt()-> esp_vhci_host_send_packet -> then if cmd, ble_hci_trans_hs_cmd_tx() or if data, ble_hci_trans_hs_acl_tx() ==> packett sent to controller.

Controller to stack (Exp direction)

The direction we are interested is slave tx. meaning esp ble controller -> hosted slave -> hosted host -> NimBLE Host stack where the fragments seen.

NimBLE Rx hooks -> ble_hs_rx_data -> controller-data-available -> host_rcv_pkt() hook->host_rcv_pkt->send_to_host_queue()

Check the other host who sent this packet to the controller and why should it fragments it so small.

As said earlier, please check the logs at both host communicating with each other.

linchanghe123 commented 4 months ago

In Wireshark, the 'slave' and 'master' refer to BLE slave and BLE master, not ESP slave and ESP master. Our understanding in this area may not be synchronized.

The data interaction between the ESP Slave and ESP host is not problematic. The maximum size of each packet in SPI communication can be up to 1600 bytes.

lQLPKHe_pW2foUnNBUzNCK-w6Uh5ImGwuKgGOBZhagGzAA_2223_1356

mantriyogesh commented 4 months ago

Okay. 'BLE-slave' is actually 'esp-hosted - host'. Little confusing terminology, I will call this as client for now. and iphone as server for now.

Is it possible to capture packet sent from client to esp-slave in addition to existing wireshark? I want to know if the packet is split before it received at esp-slave.

DBG BLE packet of interest (pkt_data, pkt_len): stm32 -> spi_tx [DBG] ===== Hosted SPI ======> spi_rx (process_rx_pkt()) -> (esp_vhci_host_send_packet) ------- controller (ble_hci_trans_hs_acl_tx()) -- OTA -- iphone)

Can you find and state where the length was first time split?

linchanghe123 commented 4 months ago

The first time the length was split is at "controller (ble_hci_trans_hs_acl_tx()) -- OTA", and below is the client and esp-slave interaction log.

lQLPKcdEU1RJ8onNAyDNAuWwWbZMKk7cRLIGOB5-7UWOAA_741_800

mantriyogesh commented 4 months ago

Just to reconfirm, At this line, https://github.com/espressif/esp-hosted/blob/2eb1fff5b7b18af20087ace3a35ba596172acdd5/esp_hosted_fg/esp/esp_driver/network_adapter/main/slave_bt.c#L443

The Len is fine (non fragmented) till it reached inside function (ble_hci_trans_hs_acl_tx()) Once it sent to controller, the fragments seen directly over the air. Correct?

mantriyogesh commented 4 months ago

before or after ble_hci_trans_hs_acl_tx()

at ble_hci_trans_hs_acl_tx() is little confusing..

linchanghe123 commented 4 months ago

Just to reconfirm, At this line,

https://github.com/espressif/esp-hosted/blob/2eb1fff5b7b18af20087ace3a35ba596172acdd5/esp_hosted_fg/esp/esp_driver/network_adapter/main/slave_bt.c#L443

The Len is fine (non fragmented) till it reached inside function (ble_hci_trans_hs_acl_tx()) Once it sent to controller, the fragments seen directly over the air. Correct?

Yes, we print out the payload_len after the process_hci_rx_pkt function, and the payload_len is 236.

e43144513ac2d72630793a2113588cf6

The len here comes from payload_len.

image

linchanghe123 commented 4 months ago

Another phenomenon is that when we use the ng mode on the Linux platform, we ultimately call ble_hci_trans_hs_acl_tx . In this case, there is no problem with sending, and the data will not be split into multiple packets for sending.

When using the RTOS platform and using the fg mode, there is this problem. In the end, the function used to send the message is the same, which is strange

mantriyogesh commented 4 months ago

Interesting. This could be good clue of stack conf. Is there any way to get your STM32 side stack conf dump?

Also, the sdkconfig both Linux and MCU for esp-slave could be helpful to compare.

Can you run sudo btmon in Linux case? I think wireshark without any filter and btmon could also pop up something

mantriyogesh commented 4 months ago

Not sure if you can repeat this on base esp-idf program, which has controller and stack running native.

If so, it would be much easier for bluetooth team also.

linchanghe123 commented 4 months ago

Interesting. This could be good clue of stack conf. Is there any way to get your STM32 side stack conf dump?

Also, the sdkconfig both Linux and MCU for esp-slave could be helpful to compare.

Can you run sudo btmon in Linux case? I think wireshark without any filter and btmon could also pop up something

The functionality under Linux is okay, but it seems that capturing this information is not very effective.

image

xaodongdong commented 1 month ago

Hello, long time no see. We are currently facing some issues with the Bluetooth communication rate. After investigation, we confirmed that the bottleneck is at the Bluetooth sending interface in the ESP32 module, specifically in the API_vhci_host_send_packet located at /esp-idf-5.1.1/components/bt/controller/esp32c3/bt.c. By capturing over-the-air packets, we found that data packets are split into 27-byte chunks when sent through this interface. We are using the esp-host-master-fg library, where this issue occurs, but in the esp-host-master-ng library, the problem does not exist, and the data can be sent as whole packets. Could you please let us know if there have been any optimizations in the new code regarding this issue? 8945273a9a41e10c9596626bba9f068 ac02a71820ff141141a679be91fb051

mantriyogesh commented 1 month ago

So should I assume:

  1. ESP-IDF is same for FG and NG?

  2. Host Linux machine and kernel version is same while both the tests?

I doubt on (1) more. In general, FG is flexible with any IDF (old/new/master), NG is generally bound with specific idf due to changes in wifi libs. One atternative is to test the same scenario on FG, with same IDF as that NG , but without wifi libs changes.

  1. What happens on ESP-IDF example directly with similar scenario example, on the NG/FG idf commit used.

We would need some time to do this comparison. But anyway if you have performed any of this, observations would be much helpful..

xaodongdong commented 1 month ago

1.The ESP-IDF is the same. 2.For FG, the host side uses STM32 + FreeRTOS, while NG uses Linux. 3. ![Uploading 3393EFE0-A015-4bd1-903A-86838C4B691C.png…]() I would like to know if this packet splitting is related to the compilation configuration?

mantriyogesh commented 1 month ago

I cannot load image in (3) .

Can you try using ESP-Hosted-FG on Linux (on master)?

There are a lot of components differing, specially, your Bluetooth stack on STM32 to that of NG (BlueZ) and both stack's configurations.

If you test FG and NG on same linux, same host, same slave, same idf, it would be correct test case for validation.

xaodongdong commented 1 month ago

3393EFE0-A015-4bd1-903A-86838C4B691C

xaodongdong commented 1 month ago

Trying FG on Linux would require a considerable amount of work for us and might not be very meaningful, as, like you mentioned, the protocol stacks are different. We are currently using the NimBLE stack, so I am interested in knowing which configurations might affect the packet splitting in the low-level API. Since the source code of the low-level API is not open to us, if I could access its source code, I would be able to identify where the packet splitting is controlled and thus understand how to modify it.

xaodongdong commented 4 weeks ago

Hello, I would like to ask if there are any alternative over-the-air packet sending interfaces available. I can try them out to see if they also result in packet splitting.

mantriyogesh commented 4 weeks ago

As a matter of fact, you just need to reflash FG binary at esp and just propogate same changes in gpio , spi etc you did for NG, at exact same files. Once you load the kernel driver, it should be just loading bluetooth automatically, similar to ng.

But comparing FG with NG with respect to you issue, makes easy and fair comparison.

mantriyogesh commented 4 weeks ago

The transport driver remains the same both NG and FG. you do not need to run any demo app, as bluetooth should be on the go. Only kernel module.loading is sufficient.

mantriyogesh commented 4 weeks ago

Porting for FG and NG is exactly the same. Note the similarity in between docs

FG porting guide and NG porting guide.

If you have ported NG already, it should be exact same diff, applied manually in exactly same named files.

You can create diff directory and do changes and then flash FG, so as ng changes are not lost and always can go back.

xaodongdong commented 4 weeks ago

Yes, I've also noticed that whether it's FG or NG, the code running on the ESP32 side is the same, and it ultimately reaches the API_vhci_host_send_packet API. On the host side, I'm using Linux with BlueZ for NG and FreeRTOS with NimBLE for FG, so making a direct comparison is challenging. However, in practice, the data sent through API_vhci_host_send_packet on NG is not split, whereas on FG, it is. Therefore, I'm currently investigating how BlueZ and NimBLE may be affecting the ESP32 stack.

I would appreciate it if you could provide some insights into the internal packet-splitting logic of the API_vhci_host_send_packet API, as it would help me pinpoint the differences and identify the root cause of the issue.

mantriyogesh commented 4 weeks ago

https://github.com/search?q=repo%3Aespressif%2Fesp-hosted%20vhci_host_send_packet&type=code

Both are same.

Bluetooth code is just the same for FG/NG. Let me know if you have any confusion or you spot any difference.

the code running on the ESP32 side is the same

The firmware are built different, but yes, wrt Bluetooth code remains the same.

so making a direct comparison is challenging.

Well, I am not asking you to run FG on MCU with NimBLE. That your final goal, I suppose. As you have NG set-up on Linux. Just run the FG on same environment, on same code commit, just use FG firmware at ESP, port same changes that you did in FG code of Linux and run the FG on Linux.

This would be first step to rule out, if FG-NG software behaving different wrt Linux. FG is completely supported on Linux. Please check: https://github.com/espressif/esp-hosted/tree/master/esp_hosted_fg#21-setup-with-linux-host

xaodongdong commented 4 weeks ago

As expected, I just completed the test with Linux + BlueZ + ESP32 (FG), and the results were the same as with ESP32 (NG); the data packets were not split. So, what should be my next steps to pinpoint the issue?

xaodongdong commented 4 weeks ago

Would the ble_st_acl_tx_buf_nb field in the esp_bt_controller_config_t structure affect the size of individual packets sent over the air?

mantriyogesh commented 4 weeks ago

NG and FG behaving similar, so I really doubt, anything at slave should really be changed.

At the same time, I came across https://github.com/espressif/esp-idf/issues/9627#issuecomment-1228052346. Which nimBLE host stack are you are using? Is it esp-nimble?

Is there any IDF version difference between MCU based IDF hosted-slave and FG or NG hosted-slave?

mantriyogesh commented 4 weeks ago

Nevertheless, the issue boils down to ESP-IDF / Bluetooth config / Bluetooth optional APIs. Not the core issue on ESP-Hosted as such.

Check if above mentioned code is available at your nimBLE host stack.

Screenshot 2024-09-02 at 3 28 44 PM

At slave, we do not use host stack at all. controller supports these APIs can be verified with IDF version. If controller supports the frame size, but host do not change or extend it, defaultts would apply.

See the image, itt only chnages all nimble/**host**/ files.

xaodongdong commented 4 weeks ago

Which nimBLE host stack are you are using? Is it esp-nimble?

mantriyogesh commented 4 weeks ago

yes, IDF commits may differ. You can use just the same idf.

cd esp_hosted_ng/esp/esp_driver/esp-idf
git status # note commit
cd ../../../esp_hosted_fg/esp/esp_driver/esp-idf
git pull
git checkout <commit_noted>
git submodule update --init --recursive
./install.sh
. ./export.sh
cd ../network_adapter
<build firmware of fg and flash>

this would make your fg flashed on same idf commit that of ng. re-test, on same environment with kernel module built in host at esp_hosted_fg/host/linux/host_control using rpi_init.sh at host.

Always ensure you use same esp-hosted commit at esp and host, to avoid any undefined race conditions.

xaodongdong commented 4 weeks ago

I think it would be quicker to address the core issue directly. It is clear that the L2CAP SAR (Segmentation and Reassembly) mechanism has been triggered. This mechanism is typically triggered when the packet size exceeds the MTU. However, both the MTU and MPS within our protocol stack are set to 512, and the actual packets sent to the ESP32 are not segmented. Therefore, I suspect that the issue might lie in the integration between the internal module code and the NimBLE stack, which could be causing the L2CAP segmentation logic to activate within the ESP32. 50c26aabf7fd6847864ed4d2444db54 ![Uploading 50c26aabf7fd6847864ed4d2444db54.jpg…]()

xaodongdong commented 4 weeks ago

屏幕截图 2024-09-03 112850 屏幕截图 2024-09-03 112913

mantriyogesh commented 4 weeks ago

https://github.com/espressif/esp-hosted/issues/388#issuecomment-2325539475

Image missed to upload..

xaodongdong commented 4 weeks ago

50c26aabf7fd6847864ed4d2444db54 this

mantriyogesh commented 4 weeks ago

To summarise:

  1. https://github.com/espressif/esp-hosted/issues/388#issuecomment-2323860652 : FG and NG works just the same when used on Linux + BlueZ
  2. If FG used with MCU case , splits the size of frames to max 27.

If the slave framework is just the same, how the issue would be in ESP firmware? Is your host stack or user app configured to limit the size as such? Did you get time to check https://github.com/espressif/esp-hosted/issues/388#issuecomment-2324004629?

xaodongdong commented 4 weeks ago

"#388(comment) I have read it, and his problem is very similar to mine, but I didn't see a solution. I added logging at the “esp_vhci_host_send_packet” interface to confirm that the data is complete at this point. 3341bc02f4b201de968d860fe8b2750 81c90bb5673ec0c6cc61cd6ab344438

So NimBLE and MCU didn't split the data. From the air packets, it appears that the data was split into 9 packets by L2CAP." 6deb2d866710bbe7c825695bc139f19

mantriyogesh commented 4 weeks ago

As you use C3, please use latest MASTER of ESP-Idf for FG.

Once you connect, call up, ble_gap_set_data_len(conn_handle, 0xFB, 0x4290)

The details are mentioned in https://github.com/espressif/esp-hosted/issues/388.

The latest master would ensure that expected commit is in place. If your nimble host stack doesn't have such change, please add host stack commit referring to above link.

Unless you make changes at nimBLE host stack, and increase the data length (request has to come from host stack.), controller for c3 would assume 27.

mantriyogesh commented 4 weeks ago

Check the bluetooth specs mentioned of this at:

tmp_62c7d714-9a88-4e2c-9d56-b01ce29186b7

mantriyogesh commented 4 weeks ago

Once you verify, let us know.

xaodongdong commented 4 weeks ago

After adding ble_gap_set_data_len(conn_handle, 0xFB, 0x4290), the issue was resolved! Thank you so much!! You've worked hard with us to pinpoint the issue for so long! Will the maximum parameters configured by this interface have any other impact? Do I need to change it to a more appropriate value, and what is the default value?

mantriyogesh commented 4 weeks ago

Yes, of course, you can configure or optimise as per your choice and use case. But earlier suggested ones are recommended defaults values.