Open claydonkey opened 12 months ago
I see #define CONFIG_ESP_RAW_TP_HOST_TO_ESP_PKT_LEN 1576
. This is bigger than the original MAX_SDIO_PACKET_SIZE
, which should not be the case.
By default, 1460 is the size of CONFIG_ESP_RAW_TP_HOST_TO_ESP_PKT_LEN
.
Hello @claydonkey 1. Do you have any changes in header structure? it should be always 12 bytes (till date). it is packed structure BTW.
2. MAX_SDIO_BUFFER_SIZE should always be 512 multiple and (512*3) is chosen as we do not expect any transport frame to exceed this size. Currently, raw ESP->Host allows evading this scenario, which is incorrect, and we will add limit here. But please do not change the above sizes, you might face unexpected issues..
3.
I had to increased the Host packet size to be 32bit aligned which is strange
Can you please explain this little in depth? I expected 512*3 (max) to be aligned. Any smaller packet size also need to be aligned to DMA (either side, according to their DMA alignments) We will check if the slave Tx packets are aligned or not (I expect them to be already although!)
4. Are you using the mempool (host)? Mempool will make sure that the mem to be allocated is always aligned on 4 bytes ( DMA expectation for ESP_as_host, check your DMA alignment expectation). If you do not use mempool, make sure that you allocate on 4bytes aligned start address and allocated size also with 4 bytes aligned size. https://github.com/espressif/esp-hosted/blob/7dc00559a1d7991f06e07cc3ff1ede1a9a427ff3/host/port/os_wrapper.h#L112-L113
@mantriyogesh @SohKamYung-Espressif
but all this may be academic as I was still using an old ported sdio_drv.c. Having updated that:
"is_transport_in_reset()" only ever reports transport_state == TRANSPORT_INACTIVE
because ii.
∴ g_h.funcs->_h_queue_item(from_slave_queue,... is never called and_interface_buffer_handlet item is never queued to the from_slave_queue queue
g_h.funcs->_h_dequeue_item(from_slave_queue`... never dequeues as the queue is empty because i. ∴ sdio_drv_evt_handler_fp is never executed to set transport_state = TRANSPORT_ACTIVE
transport_state is not set to TRANSPORT_ACTIVE anywhere else in the code from what I can see, so the app is locked at
I Starting SDIO process rx task
If I remove:
while (true) {
if (is_transport_in_reset()) {
ESP_LOGI(TAG, "transport has been reset");
break;
}
vTaskDelay(pdMS_TO_TICKS(100));
}
@mantriyogesh @SohKamYung-Espressif
sdio_generate_slave_intr(ESP_OPEN_DATA_PATH);
_(ESP_OPEN_DATAPATH==0)
uint8_t intr_mask = BIT(intr_no + ESP_SDIO_CONF_OFFSET) == BIT(0+0) == 1; with lock_required = true;
func:1 (SDIO_FUNC_1) reg: 140 (ESP_SLAVE_SCRATCH_REG_7&= ESP_ADDRESS_MASK) data: 1 (int_mask) size: 1 flags: CMD52_WRITE | CMD52_READAFTERWRITE (Hex:0x80000000)
#define SD_ARG_CMD52_FUNC_SHIFT 28
#define SD_ARG_CMD52_FUNC_MASK 0x7
#define SD_ARG_CMD52_REG_SHIFT 9
#define SD_ARG_CMD52_REG_MASK 0x1ffff
#define SD_ARG_CMD52_DATA_SHIFT 0
#define SD_ARG_CMD53_LENGTH_SHIFT 0
#define SD_ARG_CMD53_LENGTH_MASK 0x1ff
arg = (func & SD_ARG_CMD53_FUNC_MASK) << SD_ARG_CMD53_FUNC_SHIFT; arg |= (reg & SD_ARG_CMD53_REG_MASK) << SD_ARG_CMD53_REG_SHIFT; arg |= (size& SD_ARG_CMD53_LENGTH_MASK) << SD_ARG_CMD53_LENGTH_SHIFT; arg |= data << SD_ARG_CMD52_DATA_SHIFT; arg |= flags;
and after this the slave :bomb:
***** Slave: Raw Throughput testing (Report per 1 sec)*****
I (22431) NETWORK_ADAPTER: Initial set up done
I (22431) slave_ctrl: event ESPInit
I (22433) main_task: Returned from app_main()
I (22543) stats: 28-29 sec Rx: 0.00 Tx: 0.00 kbps
I (22543) stats: slave: sta_rx_in: 0 sta_rx_out: 0 sta_tx_in: 0 sta_tx_out: 0
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400d6e90 PS : 0x00060e30 A0 : 0x800949ec A1 : 0x3ffd8370
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x00000001 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x00000000
A10 : 0x00000000 A11 : 0xdeadbeef A12 : 0x00000000 A13 : 0x00000001
A14 : 0x3ffc355c A15 : 0x3ffcc05c SAR : 0x0000001a EXCCAUSE: 0x0000001d
EXCVADDR: 0x00000000 LBEG : 0x4000c28c LEND : 0x4000c296 LCOUNT : 0x00000000
Backtrace: 0x400d6e8d:0x3ffd8370 0x400949e9:0x3ffd83b0
https://github.com/espressif/esp-hosted/assets/5519575/665cf8cf-86dc-426a-88a5-c207b2e5a2bf
a. problems with sdio_drv.c i. In function sdio_read_task:
"is_transport_in_reset()" only ever reports transport_state == TRANSPORT_INACTIVE
transport_state
is put into reset via transport_drv_reconfigure()
https://github.com/espressif/esp-hosted/blob/7dc00559a1d7991f06e07cc3ff1ede1a9a427ff3/host/drivers/transport/transport_drv.c#L112
In our host code, transport_drv_reconfigure()
gets called via esp_hosted_wifi_init()
, which is currently not ideal.
The initialization sequence is:
transport_state
starts out as TRANSPORT_INACTIVE
transport_drv_reconfigure()
to trigger a slave reset (via GPIO) and puts transport_state
into the state TRANSPORT_RESET
sdio_read_task()
sees that the transport is in reset, and proceeds to send sdio_generate_slave_intr()
to open the data path and put transport_state
into TRANSPORT_ACTIVE
state.Calling transport_drv_reconfigure()
in your init sequence should resolve this. You could put this at the start of sdio_read_task()
before the check, for example.
@claydonkey
Thanks for the video, it eases the understanding.
Trace in video is little blurry, I couldn't understand, 'Uninitialized sem id 3' or 'Uninitialized sem id 1' but in any of the cases, looks the semaphore/queue is not init correctly.
is the transport_init_internal()
called to init from_slave_queue
queue (or other smaphores)?
Can you please assert and check in os_wrapper.c functions where possible access without init is made?
The slave Guru Meditation 💣 Happens when the host sends: sdio_generate_slave_intr(ESP_OPEN_DATA_PATH);
Can you modify the function event_cb()
in sdio_slave_api.c
to this version, which outputs debug info:
IRAM_ATTR static void event_cb(uint8_t val)
{
ESP_EARLY_LOGI(TAG, "enter event_cb");
if (val == ESP_RESET) {
ESP_EARLY_LOGI(TAG, "got ESP_RESET");
sdio_reset(&if_handle_g);
return;
}
if (context.event_handler) {
ESP_EARLY_LOGI(TAG, "call event_handler");
context.event_handler(val);
}
}
You should see debug output like this when the host calls sdio_generate_slave_intr(ESP_OPEN_DATA_PATH)
:
I (8517) SDIO_SLAVE: enter event_cb I (8517) SDIO_SLAVE: call event_handler I (8517) NETWORK_ADAPTER: Start Data Path
This is to check if the event_cb()
is being triggered when you write to the SDIO slave interrupt register.
@SohKamYung-Espressif + @mantriyogesh Firstly might I just say that having lived in your code for a while - Nice Job! - It is turning out to be rather lovely and I am learning alot as it is a great example of nice C 🥇
@SohKamYung-Espressif
transport_drv_reconfigure()
transport_state == TRANSPORT_RESET
but the problem is now displaced to:
while (!is_transport_up()) {}
reset_slave()
is executed but there are no interrupts or async code to set transport_state = TRANSPORT_ACTIVE
, consequently the loop does not exit.
I have added the code to the slave. The output : (as you said) is:
I (834) SDIO_SLAVE: sdio_init: ESP32 SDIO timing: 0
I (1640) stats: 0-1 sec Rx: 0.00 Tx: 0.00 kbps
I (1838) SDIO_SLAVE: enter event_cb
I (1838) SDIO_SLAVE: call event_handler
I (1838) NETWORK_ADAPTER: Start Data Path
I (1840) stats:
***** Slave: Raw Throughput testing (Report per 1 sec)*****
I (1742) NETWORK_ADAPTER: Initial set up done
I (1742) slave_ctrl: event ESPInit
I (1744) main_task: Returned from app_main()
I (2456) stats: 1-2 sec Rx: 0.00 Tx: 0.00 kbps
I (3256) stats: 2-3 sec Rx: 0.00 Tx: 0.00 kbps
I (4056) stats: 3-4 sec Rx: 0.00 Tx: 0.00 kbps
I (4856) stats: 4-5 sec Rx: 0.00 Tx: 0.00 kbps
I (5656) stats: 5-6 sec Rx: 0.00 Tx: 0.00 kbps
Guru Meditation Error: Core 0 panic'ed (StoreProhibited). Exception was unhandled.
@mantriyogesh Again thanks for the help.
is the
transport_init_internal()
called to initfrom_slave_queue
queue (or other smaphores)? Can you please assert and check in os_wrapper.c functions where possible access without init is made?
In TEST_RAW_TP transport_init_internal() is executed in esp_hosted_init() :
void app_task(void const *arg)
{
#if !TEST_RAW_TP
app_main();
#else
H_ERROR_CHECK(esp_hosted_init(NULL)); //<<here (not trying to teach you how to suck eggs)
#endif
while (1) {
hosted_sleep(5U);
}
}
I thought that was because the slave crashed, and there were no items to dequeue.
Now correct me if I'm wrong but:
I believe as queue_handle_t == QueueDefinition * It is already a pointer and is malloced in xQueueGenericCreate
As a consequence after the code exited hosted_create_queue, the queue_handle_t was destroyed (disappeared from scope) because the pointer was allocated within the scope of hosted_create_queue. (The pointer a pointer to the pointer QueueDefinition )
Queue sizes and other properties of queue_handle_t were garbled upon return to the caller function (sizes of 20 became random large numbers ex 6847268346) So replaced the type queue_handle_t * with queue_handle_t
queue_handle_t q_id = NULL >> queue_handle_t q_id = NULL q_id = (queue_handle_t )queue_handle >> q_id = (queue_handle_t)queue_handle item_added_in_back = xQueueSendToBack(q_id, item, timeout) >> item_added_in_back = xQueueSendToBack(q_id, item, timeout) vQueueDelete(q_id ) >> vQueueDelete(q_id)
etc. I will look at other the function types in hosted_osi_funcs_t and perhaps change their signatures.
The new error now makes sense and pertains to the STM32 not handling SDIOIT interrupts leaving me to poll the interrupt pin
int hosted_sdio_wait_slave_intr(uint32_t ticks_to_wait) {
SDIO_FAIL_IF_NULL(hsd1->Instance);
return (g_h.funcs->_h_read_gpio(GPIOC, GPIO_PIN_9));
}
Annoying but previously I just removed the warning and everything worked. I have been communicating with STM about this issue but polling or waiting for the pin seems the only option at the moment
E wait_slave_intr error: 1
E wait_slave_intr error: 1
(repeats)
is hidden by commenting out
static void sdio_read_task(void const *pvParameters){
//...
if (res != ESP_OK) {
// ESP_LOGE(TAG, "wait_slave_intr error: %d", res);
continue;
}
//...
}
The errors now pertain to the resetting of the slave: These are from the stm32 port sdio_drv.c and the msgs are in my driver. It should speak for itself:
I sending block
E sdmmc_io_write_bytes_dma CMD ERROR. Size=1024
E sdio_write_task: 0: Failed to send data: -2 1472 1472
I sending block
I have added the code to the slave.
From the output, the slave interrupt is now being sent from the host to the slave, and transport_state
should now be TRANSPORT_ACTIVE
.
Some more troubleshooting for the Guru Meditation Error
:
On the slave side, CONFIG_ESP_RAW_THROUGHPUT_TRANSPORT
should be enabled for the raw throughput test.
As a test, on the Host, DO NOT enable CONFIG_ESP_RAW_THROUGHPUT_TRANSPORT
. This should cause the host to not transmit raw data packets. But it can still receive, so this is to test that the slave Tx is fine.
You should see output like this from the slave:
I (1217) NETWORK_ADAPTER: Initial set up done
I (1217) slave_ctrl: event ESPInit
I (1227) main_task: Returned from app_main()
I (1617) stats: 0-1 sec Rx: 0.00 Tx: 0.00 kbps
I (2617) stats: 1-2 sec Rx: 0.00 Tx: 0.00 kbps
I (3617) stats: 2-3 sec Rx: 0.00 Tx: 0.00 kbps
I (4617) stats: 3-4 sec Rx: 0.00 Tx: 0.00 kbps
I (5617) stats: 4-5 sec Rx: 0.00 Tx: 1266.09 kbps
I (6617) stats: 5-6 sec Rx: 0.00 Tx: 24888.44 kbps
I (7617) stats: 6-7 sec Rx: 0.00 Tx: 24888.44 kbps
I (8617) stats: 7-8 sec Rx: 0.00 Tx: 24888.44 kbps
I (9617) stats: 8-9 sec Rx: 0.00 Tx: 24899.84 kbps
I (10617) stats: 9-10 sec Rx: 0.00 Tx: 24888.44 kbps
I (11617) stats: 10-11 sec Rx: 0.00 Tx: 24899.84 kbps
I (12617) stats: 11-12 sec Rx: 0.00 Tx: 24888.44 kbps
If this is fine, the problem may lie in what the host is sending to the slave.
First, enable CONFIG_ESP_RAW_THROUGHPUT_TRANSPORT
on the host so it will transmit raw data packets.
To verify what the slave is receiving from the host: in slave_sdio_api.c
in the function sdio_read()
:
Replace
ret = sdio_slave_recv(&(buf_handle->sdio_buf_handle), &(buf_handle->payload),
&(sdio_read_len), portMAX_DELAY);
with this:
ESP_LOGI(TAG, "calling sdio_slave_recv");
ret = sdio_slave_recv(&(buf_handle->sdio_buf_handle), &(buf_handle->payload),
&(sdio_read_len), portMAX_DELAY);
ESP_LOGI(TAG, "sdio_read_len: %d", sdio_read_len);
You should see a lot of repeated output like this:
I (6257) SDIO_SLAVE: sdio_read_len: 1472
I (6257) SDIO_SLAVE: calling sdio_slave_recv
I (6257) SDIO_SLAVE: sdio_read_len: 1472
I (6267) SDIO_SLAVE: calling sdio_slave_recv
I (6267) SDIO_SLAVE: sdio_read_len: 1472
I (6277) SDIO_SLAVE: calling sdio_slave_recv
For receiving data, the max buffer size on the slave side is currently set to 1600. If the host is sending more data that this, it might be the cause of the Guru Meditation Error
.
As a consequence after the code exited hosted_create_queue, the queue_handle_t was destroyed (disappeared from scope) because the pointer was allocated within the scope of hosted_create_queue. (The pointer a pointer to the pointer QueueDefinition )
Actually, pointer to pointer concept was intended to make even non-FreeRTOS have their own queue type within opaque pointer. But as I see,
#define queue_handle_t QueueHandle_t
defeats the purpose, and outer structure should really not be of any OS dependent struct. Would take care of this. Ideally it should have been something like:
struct queue_handle_t {
QueueHandle_t os_q_handle;
};
For your simplicity, you can even directly use outer pointer directly as xQueueCreate
. But you might have to be cautious to even change hosted_queue_item()
, hosted_dequeue_item()
, etc accordingly.
On a different angle, these porting functions should be all the same which you had tested as part of https://github.com/espressif/esp-hosted/issues/270#issuecomment-1803576426.
I am not saying they are perfect at the moment (needs correction), but still should have worked as is?
hosted_create_queue is used with , g_h.funcs->_h_create_queue()
returning the handle of the newly allocated queue, agnostic of its OS. This handle is important and is the sole scope within the sdio_drv.c for that queue.
to_slave_queue = g_h.funcs->_h_create_queue(TO_SLAVE_QUEUE_SIZE,
sizeof(interface_buffer_handle_t));
assert(to_slave_queue);
Yes, if you have to rely on the D1 line raw value, you would need to comment
// ESP_LOGE(TAG, "wait_slave_intr error: %d", res);
as there would not be any other option in your code. We would still keep this line as is to cater any wrong doing for any other host in future. As part of polling in your code, you would need to rely upon code till:
if (!(NEW_PACKET_INTR & interrupts)) {
// interrupt is not for a new packet
ESP_LOGD(TAG, "Slave intr is not for 'new packet'");
SDIO_DRV_UNLOCK();
continue;
}
In next block we get legth to be read:
ret = sdio_get_len_from_slave(&len_from_slave, ACQUIRE_LOCK);
if (ret || !len_from_slave) {
ESP_LOGD(TAG, "invalid ret or len_from_slave: %d %ld", ret, len_from_slave);
SDIO_DRV_UNLOCK();
continue;
}
sdio_get_len_from_slave() takes care of valid length to be populated. But as you are doing polling, for extra safety, try in your testing version,
assert(len_from_slave < MAX_SDIO_BUFFER_SIZE)
I have yet not responded to
I sending block
E sdmmc_io_write_bytes_dma CMD ERROR. Size=1024
E sdio_write_task: 0: Failed to send data: -2 1472 1472
I sending block
but this would be too dependent upon the bus handling and might need debugging. If you have last working code, semantic diff would be faster than debugging on the bus.
Urgent attention would be on resolving Uninitialized sem id
.
From an stm32 host. Initialization for TEST_RAW_TP works as expected but soon after transmission of packets the slave panics: Guru Meditation....
I had to increased the Host packet size to be 32bit aligned which is strange because I didn't notice this alignment issue in previous versions. I am using heap 1_4 for FreeRTOS
https://github.com/espressif/esp-hosted/assets/5519575/28a258ff-98b9-4485-b869-7a58c34dca13