espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.7k stars 7.29k forks source link

I2C prone to crash from malloc and free (IDFGH-3087) #5108

Closed iltis42 closed 3 years ago

iltis42 commented 4 years ago

Hi guy's using recent: v4.2-dev-1097-g2e14149bf

I2C might crash from multiaccess (timer routines, ISR), esp. in critical section with memory alloc and free, that is i2c_cmd_link_create() and _delete(), see (2). The good news, the problem is easily to be fixed according to the documentation memory alloc and free is not safe for timer and interrupt, but can easily be achieved by the following 5 lines i added, see (1). Its easy to say: We don't support this, but the usual way to go for e.g. a precise timer controlled data aquisition is of course some timer or interrupt routine. Or you are within callbacks from libraries that are timer routines as well and you have no choice. A research in the internet showed that many people have this problem and spend day's with debugging about the root cause, so i did.

Hence happy now to go the solution an won't keep back telling you guys to add this small footrpint but saftety gain to the official esp-idf components/driver/i2c.c Hope this will help many others in with the same problem.

1) portMUX_TYPE mux = portMUX_INITIALIZER_UNLOCKED;

i2c_cmd_handle_t i2c_cmd_link_create(void) { I2C_ENTER_CRITICAL_ISR(&mux);

if !CONFIG_SPIRAM_USE_MALLOC

i2c_cmd_desc_t *cmd_desc = (i2c_cmd_desc_t *) calloc(1, sizeof(i2c_cmd_desc_t));

else

i2c_cmd_desc_t *cmd_desc = (i2c_cmd_desc_t *) heap_caps_calloc(1, sizeof(i2c_cmd_desc_t), MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);

endif

**I2C_EXIT_CRITICAL_ISR(&mux);**
return (i2c_cmd_handle_t) cmd_desc;

}

void i2c_cmd_link_delete(i2c_cmd_handle_t cmd_handle) { if (cmd_handle == NULL) { return; } I2C_ENTER_CRITICAL_ISR(&mux); i2c_cmd_desc_t cmd = (i2c_cmd_desc_t ) cmd_handle; while (cmd->free) { i2c_cmd_link_t *ptmp = cmd->free; cmd->free = cmd->free->next; free(ptmp); } cmd->cur = NULL; cmd->free = NULL; cmd->head = NULL; free(cmd_handle); I2C_EXIT_CRITICAL_ISR(&mux); return; }

2) ==================== CURRENT THREAD STACK =====================

0 __wrap_calloc (nmemb=, size=16) at /home/nextpcb/esp/esp-idf/components/heap/include/heap_trace.inc:177

1 0x40040020 in ?? ()

2 0x400f95d4 in i2c_master_read_byte (cmd_handle=0x3ffb9a80, data=0x3ffe518e \"\", ack=) at /home/nextpcb/esp/esp-idf/components/driver/i2c.c:980

3 0x400e8553 in I2C::read (this=0x3ffcafb8 , byte=0x3ffe518e \"\", ack=0) at /home/nextpcb/esp/esp-idf/examples/sensor/main/I2C.cpp:166

4 0x400e85f7 in I2C::read16bit (this=0x3ffcafb8 , addr=77 'M', word=0x3ffe51de) at /home/nextpcb/esp/esp-idf/examples/sensor/main/I2C.cpp:101

5 0x400e7c41 in MCP3221::readRaw (this=0x3ffcafb8 , val=@0x3ffe51de: 0) at /home/nextpcb/esp/esp-idf/examples/sensor/main/mcp3221.cpp:56

6 0x400e7c5a in MCP3221::readAVG (this=0x3ffcafb8 , alpha=0.200000003) at /home/nextpcb/esp/esp-idf/examples/sensor/main/mcp3221.cpp:28

7 0x400e2890 in MP5004DP::readPascal (this=0x3ffc9d54 , minimum=30) at /home/nextpcb/esp/esp-idf/examples/sensor/main/MP5004DP.cpp:119

8 0x400dc46c in readBMP (pvParameters=) at /home/nextpcb/esp/esp-idf/examples/sensor/main/sensor.cpp:164

9 0x400975dc in vPortTaskWrapper (pxCode=0x400dc420 <readBMP(void*)>, pvParameters=0x0) at /home/nextpcb/esp/esp-idf/components/freertos/xtensa/port.c:143

Alvin1Zhang commented 4 years ago

@iltis42 Thanks for reporting, we will look into.

iltis42 commented 4 years ago

@Alvin1Zhang Ni hao and xie xie for looking into. Same thing i needed to add in the function just below those two for link create and delete. After that my code ran stable over the night and whole day until now:

static esp_err_t i2c_cmd_link_append(i2c_cmd_handle_t cmd_handle, i2c_cmd_t *cmd) { I2C_ENTER_CRITICAL_ISR(&mux); : I2C_EXIT_CRITICAL_ISR(&mux); return ESP_OK; err: I2C_EXIT_CRITICAL_ISR(&mux); return ESP_FAIL; }

Alvin1Zhang commented 4 years ago

@iltis42 Thanks for keeping us updated. I have put this in our internal backlog for checking, will share progress if updates available. Thanks.

koobest commented 4 years ago

Hi, @iltis42

Can you provid a code then can reproduce this issue? Did you call i2c_cmd_link_create and i2c_cmd_link_delete in ISR?

thanks !!

iltis42 commented 4 years ago

Hi, in principle what you write should provoke the problem, together with some other code using intensively using malloc and free outside an ISR. Hence i am not 100% convinced if really an ISR is the cause, okay malloc and free should be thread-safe, hence is so the code inside ic2.c link create and delete also thread safe ? There are pointered lists with no protection around: How about this is preemted by another task, will it be safe if in the middle of the while loop this happens ?

while (cmd->free) { i2c_cmd_link_t *ptmp = cmd->free; cmd->free = cmd->free->next; free(ptmp); } cmd->cur = NULL; cmd->free = NULL; cmd->head = NULL;

free(cmd_handle); My code i am using is complex, its here in this open source project: https://github.com/iltis42/OpenIVario You find there in master branch under main the sources from the backtrace i provided. Actually the thread that crashed with I2C is not within an ISR (supporting my statement above), but even so crashing. An explanation coud be that libs that i am using allocate memory within an ISR, but for that there is no real proof i can give. This may be, but i think the issue is the non thread safe part we see above. And the protection i added preventing this from preemption fixed my problem. Best regards, Eckhard

Am 14-Apr-2020 09:51:59 +0200 schrieb notifications@github.com:

Hi, @iltis42

Can you provid a code then can reproduce this issue? Did you call i2c_cmd_link_create and i2c_cmd_link_delete in ISR?

thanks !!

-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.


FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT

iltis42 commented 4 years ago

I see it has been closed, can you please add here the changes in the fix ?

ginkgm commented 4 years ago

Hello @iltis42 , @koobest ,

Generally speaking, it's a common rule for all our drivers to put all the resource allocation code, into earlier phases, instead of runtime interface/code in the critical section. So you can see we seldom do malloc after the driver/channel is initialized, except for some workaround (even for the workaround, we will try to make the developer understand they are suggested to do the resource allocation earlier, instead of depending on the workaround).

And it is also common in other OSs, that high-cost/system related functions, i.e. malloc, printf, delay, etc., , are not allowed to be called in the ISR. So from my perspective, in the IDF, they are not guaranteed to be able to be called in the ISR as well, though maybe it's OK for now because our implementation doesn't explicitly forbid this, while it happen to work.

For the I2C case, it's suggested that the i2c_cmd_link_create is already called in the task earlier, instead of in any callbacks that is called in the ISR. Even the i2c_master_cmd_begin function is also treated as heavy function that shoudn't be called in the ISR as well. The suggested way of doing this is to send a semaphore in the ISR callback and do the i2c transaction in the task.

But seen from your case, maybe we can make two new features:

  1. provide a set of functions, which allows memory allocation at earlier stages in one go, and re-use the command later. So that you can build your command in critical section without allocating new memory, with the help of a command pool.

    The interface may look like:

    1. i2c_cmd_create_complete();
    2. and then reuse the existing start, write, .etc functions, or a set of new functions
    3. i2c_master_cmd_begin();
    4. i2c_cmd_recycle(); //to remove the existing records
    5. start from 2 again
  2. provide a i2c_master_cmd_begin_isr() to allow doing transaction in the ISR safe. But please note, there's no guarantee the function MUST success. We don't have a queue in the I2C driver now (so all task compete for the i2c host, and the one acquires the semaphore can send, and then wake up other tasks after it finishes). When ISR cannot send the transaction immediately, maybe for now we can only discard the command request.

Do you think these features will be useful?

devanlai commented 4 years ago

I'm not the OP, but I wanted to voice my support for having an I2C driver API that allows for up-front allocation of / reuse of the commands. Since I almost always send the exact same command after the initialization phase, it is a bit silly to have to allocate and free the same command over and over again.

This goes from a mere annoyance to a serious problem if all of the internal memory is exhausted and the I2C driver becomes unable to function. If I could allocate the memory up-front and re-use it, I could ensure that the I2C driver will always be usable, even if other subsystems become unusable due to internal memory exhaustion.

iltis42 commented 4 years ago

I support this message, above proposal adding a second API for recycling resources appears also a bit creepy to me. And i want also to emphasize that standard usage by application example (1), as is mine (2), is exactly like this that i2c_cmd is created on the fly in the routine sending the data. The proposal would let run all users that are going to implement the it as in the application example still in this issue.

And I already tried to move "cmd = i2c_cmd_link_create()" to the place the i2c_driver_install() takes place, hence this didn't work, as obviously is not designed for that. So i can confirm this doesn't work and so do not consider this as an option. Why not do the fix within the existing i2c_cmd_link_create(), instead offering a second function ?

And i see also, like above devanlai already stated, the issue of using heavy alloc/free in a place often used and where it's about very small amount of memory. I see e.g. the i2c_cmd_desc_t is just a structure consisting of 3 pointers. Maybe instead of allocation (malloc) the data upfront, the data may also be created on stack without needing the heap (A) what wouldn't need alloc/free at all and got freed automatically when running out of scope ? So the user can decide according to his demands how to handle. We could hand over a pointer, by default zero (old behaviour with alloc/free), but once given this data is used avoiding the alloc/free.

A) handle i2c_cmd_link_create( i2c_cmd_desc_t *desc=0 ); { i2c_cmd_desc_t desc; cmd = i2c_cmd_link_create( &desc ); : i2c_cmd_link_delete(cmd); }

So i see either this 2 way's to go: a) Avoid alloc/free in i2c_cmd_link_create() e.g. by handing over storage placed on the stack, or allocated upfront b) My workaround, keeping heavy alloc/free but protecting it by MACRO

1) Application Example https://github.com/espressif/esp-idf/blob/357a2776032299b8bc4044900a8f1d6950d7ce89/examples/peripherals/i2c/i2c_self_test/main/i2c_example_main.c

2) My Usage

cmd = i2c_cmd_link_create();
uint8_t byte=(uint8_t(data & 0xFF));
i2c_master_start(cmd);
i2c_master_write_byte(cmd, byte , 0);
i2c_master_stop();
i2c_master_cmd_begin(sdapin, cmd, 100 / portTICK_RATE_MS);
i2c_cmd_link_delete(cmd);
iltis42 commented 3 years ago

Hi Espressiv, any updates on that ?

iltis42 commented 3 years ago

Until now i needed using my workaround:

$ git diff components/driver/i2c.c diff --git a/components/driver/i2c.c b/components/driver/i2c.c index 0acc2f983..678f08fd7 100644 --- a/components/driver/i2c.c +++ b/components/driver/i2c.c @@ -39,6 +39,7 @@ static const char *I2C_TAG = "i2c";

/ DRAM_ATTR is required to avoid I2C array placed in flash, due to accessed from ISR /

+

define I2C_ENTER_CRITICAL_ISR(mux) portENTER_CRITICAL_ISR(mux)

define I2C_EXIT_CRITICAL_ISR(mux) portEXIT_CRITICAL_ISR(mux)

define I2C_ENTER_CRITICAL(mux) portENTER_CRITICAL(mux)

@@ -810,13 +811,19 @@ esp_err_t i2c_set_pin(i2c_port_t i2c_num, int sda_io_num, int scl_io_num, bool s return ESP_OK; }

+// i2c_cmd_desc_t handle; + +portMUX_TYPE mux = portMUX_INITIALIZER_UNLOCKED; + i2c_cmd_handle_t i2c_cmd_link_create(void) {

@@ -825,6 +832,7 @@ void i2c_cmd_link_delete(i2c_cmd_handle_t cmd_handle) if (cmd_handle == NULL) { return; }

@@ -897,6 +908,7 @@ esp_err_t i2c_master_stop(i2c_cmd_handle_t cmd_handle) cmd.hw_cmd.byte_num = 0; cmd.data = NULL; return i2c_cmd_link_append(cmd_handle, &cmd); + }

esp_err_t i2c_master_write(i2c_cmd_handle_t cmd_handle, uint8_t *data, size_t data_len, bool ack_en)

speed-axel commented 3 years ago

Hi,

I have run in the same issue as described by iltis42 commented on 12 Apr 2020. When can I expect a fix for that in the IDE? From my point of view is the correction very simple and can be done in 5 minutes.

ginkgm commented 3 years ago

Hi @iltis42 , @speed-axel ,

Thanks for your suggestions and code proposal. I'd like to explain why critical section will not be added to create/delete, or other API, and what workaround we have.

  1. As mentioned above, malloc in ISR is a forbidden usage. IDF is growing and changes will be made to the heap allocator. If we make a new feature depending on an undetermined (and even forbidden) behavior, this new feature will also easily get broken. And at that time we may have no available workaround for this, because essentially heap implementation have the assumption - will not be called in the ISR.

  2. It's easy to use a pool or sth to make a command in advance. But the more important thing than how to allocate a command is that, it's also not allowed to begin a command inside the ISR. I think you can easily understand, if you read this code https://github.com/espressif/esp-idf/blob/master/components/driver/i2c.c#L1182 . The current driver implementation is not designed to be called in the ISR. Maybe it happens to work, maybe the docs is not good enough to give a hint about this issue, I'm sorry about that. But we will not go one step further to make it become a feature.

    This usage is a misleading to the other developers. This Semaphore version shouldn't be called in the ISR, because ISR can't wait for anything. What if you failed to get the semaphore, because the previous transaction is still in-flight? The error handling will be much more complicated than just calling begin in a task. So I want to make this difference explicit. I would like to ask again, do you really need an ISR version? (even if it is provided, you will have to design the error handling if it fails at first call)

    This limitation is also very common for other driver APIs. This is because the HW resource is always limited. To make it thread-safe, we may have a lock above the HW layer to avoid concurrency issue. This lock basicallly avoids the driver API being called in the ISR. So if the driver API has not explicitly mentioned it is designed for the ISR, you should always assume it is not available to be called in the ISR.

  3. I think the problem why you can't do create and reconfigure a command everywhere is that, not all the momory required by the command are allocated at once, and there is no way to reuse an "allocated" command by its handle. But as you can see, there are some developers depending on the persistence of allocated command. So that's why I propose to create a new set of API to support this requirement, to avoid massive malloc/free, and provide the conveniene of doing this on the stack.

Michael

iltis42 commented 3 years ago

Dear Michael, first of all, i want kindly want mention again here "I2C is not within an ISR" in my project, even so it crashes. You may check yourself here: https://github.com/iltis42/XCVario i have not a single ISR, just several threads (tasks), thats all. Regarding 1) and 2) and the cause or the crash, looking that the backtrace, i think i can provide an easy explanation: Let's look at the code below, esp. the while(cmd->free) { .. } code part (3). Already the linked list used to free the storage is definitively not threadsafe. This has nothing to do with ISR, tasks may be preemted by higer priority tasks, and interrupt this loop can of course create inconsistency as meanwhile a new allocation modifies the current and next pointer and getting back to here will end up in an segmentation violation. Regarding 3) In my code there is also no reuse of allocated memory, nor any intension to do so. There only standard usage but multithreaded, and each command send to I2C bus is creating the link and deleting the link few lines later as it belongs.

Maybe the problem is not yet understood, pointing to ISR as culprit what isn't actually the case, or reuse of an allocated handle what is not done either. The problem is about multithreading, and i am fine changing my code to a different API in order to solve this problem. I have not stated that those critical section protection hooks i have added are the solution. Its a hack to survive as long there is not better way. So i strongly support the idea of a different API, and propose an architecture where we may be able to do the memory allocation outside the context (2), e.g. also just placing it in a local variable on the stack and use it like shown in (1).

How about this ?

Thanks in advance.

1) void send_byte( uint8_t b ){ i2c_cmd_desc_t desc; i2c_cmd_handle_t handle = i2c_cmd_link_create(&desc); uint8_t byte=(uint8_t(data & 0xFF)); i2c_master_start(handle); i2c_master_write_byte(handle, byte , 0); i2c_master_stop(); i2c_master_cmd_begin(sdapin, handle, 100 / portTICK_RATE_MS); i2c_cmd_link_delete(); }

2) handle i2c_cmd_link_create( i2c_cmd_desc_t desc=0 ); { if( !desc ) (i2c_cmd_desc_t ) calloc(1, sizeof(i2c_cmd_desc_t));

}

handle i2c_cmd_link_delete( i2c_cmd_desc_t *desc=0 ); { if( desc ) free( desc );

}

3) void i2c_cmd_link_delete(i2c_cmd_handle_t cmd_handle) { if (cmd_handle == NULL) { return; } I2C_ENTER_CRITICAL_ISR(&mux); i2c_cmd_desc_t cmd = (i2c_cmd_desc_t ) cmd_handle; while (cmd->free) { i2c_cmd_link_t *ptmp = cmd->free; cmd->free = cmd->free->next; free(ptmp); } cmd->cur = NULL; cmd->free = NULL; cmd->head = NULL; free(cmd_handle); I2C_EXIT_CRITICAL_ISR(&mux); return; }

ginkgm commented 3 years ago

Hi @iltis42 ,

Thanks for your explaination, this makes your use case more clear. sorry for misunderstanding before. If we can get rid of the ISR, the discussion will be much simpler and we can focus on the concurrent issue of driver APIs, instead of the usage of malloc. Could you also post the link to the usage of I2C in your repo?

One thing to mention before the following discussion is that, to avoid over-design and the cost of locks, our driver provides only "necessary" concurrent protection. What is "necessary"? Usually we assume that each top layer object that is directly controlled by the user, is only used by a single task. And multiple objects of the same layer should be able to be used concurrently. The driver should handle the underlayer arbitration, if more than one top layer object requires a single resource.

For some peripherals, there is no conceptions of "channel" or "device", so the top layer object is the whole peripheral, and the peripheral should only be used by one task. However for drivers with conception of "device", each device can be controlled by a task, and be used concurrently. For this I2C driver, though we don't explicitly have the idea of "slave device", each command can be treated as used to communicate with different devices, so if you mean to send commands concurrently in different tasks, this requirement should be approved.

To be more accurate:

Could you help to clarify which one above is your use case? If it is the first case (very likely, seen from your snippet 1), it's really weird because the crash shouldn't happen in the create, append, delete. Because these parts only touch the command itself, no other shared resource is touched. As for cmd_begin function, it should handle this problem properly. We may need further debug to find out the root cause. However if it's the second case (I guess your thread-safe example above is this case), I would say, we will probably not support this in our driver, it's over-design.

BTW, there may be some drivers have implemented locks that is not "necessary". For back-compatible reason, we will not remove the lock, but we will neither add locks at unnecessary place.

Michael

iltis42 commented 3 years ago

Dear Michael, you are right about the multi threading, its even mentioned in the API description of the I2C driver: "The I2C APIs are not thread-safe, if you want to use one I2C port in different tasks, you need to take care of the multi-thread issue."

I2C is used for several chips: https://github.com/iltis42/XCVario/blob/master/main/mcp4018.cpp https://github.com/iltis42/XCVario/blob/master/main/mcp3221.cpp https://github.com/iltis42/XCVario/blob/master/main/MP5004DP.cpp by the following I2C bus implementation: https://github.com/iltis42/XCVario/blob/master/components/I2Cbus/src/I2Cbus.cpp and mostly hooked in one single task named void readBMP(void *pvParameters) here: https://github.com/iltis42/XCVario/blob/master/main/sensor.cpp except the mcp4018, the digital poti that needed to be triggered by an extra task, the audio task, that needed to run on a higher priority to ensure best sound.

So the second case is my use case, and there are constraints in my design that do not allow to do both things in one task as both, on the one hand sound control that is time critical with fade in and out feature, and also sensor reading, not so time critical but needing a constant time displace rate of readouts. Both can't be merged in one task as would make things very awkward and non performant.

Maybe an external semaphore may help here to take care of that, agree that should be possible, hence i wished myself a more multithreaded approach already in the driver, maybe customizable by a build variable, so its not impacting performance for others.

And there is the overhead of malloc and free, i wanted to avoid. Maybe the crash comes only from the linked list in the link create/delete, and malloc and free is not the culprit, what i meanwhile think as malloc and free in principle is threadsafe in ESP idf, right ? If this is true, we are talking about a performance feature when we want to get away from malloc and free, not about a cause for the crash, as indicated in the headline here. So two different things i guess.

ammaree commented 3 years ago

@ginkgm @igrr

We have a problem with the I2C driver from a performance point of view. We have a range of I2C connected sensors and actuators on different ESP32 devices. The sensors are read based on cloud configurable (100mS to 60s) intervals.

Some sensors are uncomplicated, simple write and/or or combined write+read operations. Others are slightly more complicated with read+modify+write or write+delay+read operations. Many sensors are DS248x connected and 1-Wire based (DS18x20, DS1990x etc) and some require a delay (often as long as 750mS) between I2C write and a following I2C read activity. Some actuators (mainly relays) are controlled using I2C GPIO extender (PCA9555 etc) and similar to @iltis42 requirement above, need to be handled at a higher priority

The result of the above is that I2C sensor and actuator operations are completely unpredictable depending on the specific sensor and actuator sequencing and the specific intervals configured.

To try and optimise the use of the I2C port and bus we have just started testing a solution utilising an I2C task servicing queued IO requests from other tasks. 1 step operations (read, write, write+read or read+write) or 2 step (read+modify+write) are handled with a single queue entry. 3 step operations (write+delay+read) are handled by a) queueing the read operation, b) on read completion, scheduling an RTOS timer to handle the delay and callback c) after delay the callback queues the read operation back into the I2C task with another callback to signify final completion..

The high priority write operation to PCA9555 controlled actuators we are trying to handle by adding these write requests at the from of the I2C task queue, hence removing possible delay.

We retrying to optimise the I2C task and the queued interaction through the timer task with the callback as much as possible, and especially want to eliminate the malloc/free requirement.

Our questions are as follows: Q1. Can you make any details available on the planned new API to improve I2C operations? Q2. can you provide any indicative timing on when these changes will be available for testing?

André