ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.68k stars 2.99k forks source link

Unable to work with two physical I2C buses #14735

Closed rkotan closed 3 years ago

rkotan commented 3 years ago

Description of defect

Some time ago I updated mbed-os to 6.11 and suddenly faced performance issue. CPU load has become very high, min 95% though it had never been above 45-50% before. After some profiling and digging in the sources I found the cause of this in the I2C driver. I use two physical buses with several devices on each, so it seems appropriate to me to have one I2C class instance per bus. However, I2C class has static members _owner and _mutex (lines 237-239 of I2C.h):

    static I2C  *_owner;
    int    _hz;
    static SingletonPtr<PlatformMutex> _mutex;

So, both of them are shared between all instances in the program and every call of I2C::read or I2C::write from another I2C instance completely reconfigures it via calling I2C::acquire. For obvious reasons this behavior seems completely wrong and in my case after some time I2C::set_frequency starts to eat up all available CPU time, because of deadlock or some improperly configured periphery.

I simply made _owner and _mutex non-static and issue was gone, though in that case I2C::acquire and _owner also seems useless.

I suggest that _owner and _mutex should be non-static for obvious reasons and some consistency with behavior of SPI class where both of them are non-static.

Target(s) affected by this defect ?

probably all, tested on STM32H743

Toolchain(s) (name and version) displaying this defect ?

gcc version 10.2.1 20201103 (release)

What version of Mbed-os are you using (tag or sha) ?

mbed-os-6.11.0

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed-cli 1.10.5

How is this defect reproduced ?

1) Initialize I2C instances for each of two (or more) separate physical buses. 2) Connect a device to each bus. 3) Try to write/read something to/from each device.

LMESTM commented 3 years ago

Probably a duplicate of https://github.com/ARMmbed/mbed-os/issues/14732 @rkotan You may want to revert #14557 until this is fixed - sorry about the inconvenience

rkotan commented 3 years ago

@LMESTM

You may want to revert #14557 until this is fixed - sorry about the inconvenience

Originally I used commit 14e5d307bb6cdccb554b591ab2602d8d47e0b2d0 from May 17 which was made before merging this feature. Nevertheless, I updated my local version to 5047bd3c2782dd0d996acd950f71d19e0b2324f7 and reverted d974b4743939c8ba62483842c35f720180cf5120 -- nothing has changed.

Besides, I've fixed this issue locally and can make a PR if it's ok.

rkotan commented 3 years ago

Updated the original post: added static keywords which were missing in the piece of code but are present in the actual header file :sweat_smile:

jeromecoutant commented 3 years ago

Besides, I've fixed this issue locally and can make a PR if it's ok.

Sure, we will be happy to review your proposition! Thx

affrinpinhero-2356 commented 3 years ago

Hi @rkotan, I would like to get some more info regarding your application.

affrinpinhero-2356 commented 3 years ago

I was not able to reproduce this issue. Target : H747-Disco. Slave :

Method used for reproduce: Created an I2C example that read and write 2 bytes of data on I2C_1 & I2C_4. Used mbed statistic example mentioned (https://os.mbed.com/docs/mbed-os/v6.11/apis/mbed-statistics.html). To tweak CPU statistics/load.

Observations: Before Rebase (3ee443bfde86d71c068951edb920b3622d9ef71c)

ADXL DIV ID 0xe5, MPU DIV ID 0x68
Time(us): Up: 26033081 Idle: 24296451 Sleep: 24296451 DeepSleep: 0
Idle: 91% Usage0x1 : 9%

W_T 4100, TRD CND 6

ADXL DIV ID 0xe5, MPU DIV ID 0x68
Time(us): Up: 28033081 Idle: 26133548 Sleep: 26133548 DeepSleep: 0
Idle: 91% Usage0x1 : 9%

After Rebase (Commit: c780165686436d305dc15daf39efaa1f6b0a4804)

W_T 400, TRD CND 73
     
ADXL DIV ID 0xe5, MPU DIV ID 0x0
Time(us): Up: 142033081   Idle: 131993262   Sleep: 131993262   DeepSleep: 0
Idle: 91% Usage0x1 : 9%
     
W_T 300, TRD CND 81
     
ADXL DIV ID 0xe5, MPU DIV ID 0x0
Time(us): Up: 144033081   Idle: 133827644   Sleep: 133827644   DeepSleep: 0
Idle: 91% Usage0x1 : 9%
0xc0170 commented 3 years ago

Besides the change in the target code (to increase the time).

We still shall consider doing similar work as SPI did in the pull request https://github.com/ARMmbed/mbed-os/pull/9469 (cc @kjbracey-arm - would it make sense to do the same peripheral logic for I2c).

rkotan commented 3 years ago

Hi @affrinpinhero-2356,

I would like to get some more info regarding your application.

Of course, my specs are:

Target: custom, based on the STM32H743VIT6 (mbed-os target inherited from MCU_STM32H743xI) Target clock speed: 400 MHz I2C Bus speed: 400 kHz in all cases (for all buses and all devices). I also tried Standard speed (100 kHz) -- there is no change. Slaves: 24LC16B & BMX160 on I2C1, LIS3MDL & LPS25HB on I2C4. Sensors update rate: BMX160 -- 100 Hz, LIS3MDL -- 20 Hz, 24LC16B & LPS25HB -- on demand Old mbed version: dde7b22ea32b04ff0c1f22f4cc4ff0032e17decf

I use a blocking read/write for configuring sensors and managing settings in the EEPROM and asynchronous transfer to obtain measurements from sensors. For each physical bus I have a dedicated thread in which this piece of code is called every time when sensor measurements are requested:

    void sendNextMsgFromQueue()
    {
        I2CMessage *cur_msg;
        const bool result = this->msg_queue_.try_get(&cur_msg);
        if(result)
        {
            i2c_.transfer(cur_msg->addr_, cur_msg->tx_buf_, cur_msg->tx_length_,
                          cur_msg->rx_buf_, cur_msg->rx_length_,
                          mbed::Callback<void (int)>(this, &I2C_Bus::dataCallback), I2C_EVENT_TRANSFER_COMPLETE, false);
            this->cur_event_ = cur_msg->callback_;
        }

    }

My program currently hangs during the asynchronous transfer to BMX160 which contains 1 TX byte (which is 0x04 -- start address of measurement results) and 20 RX bytes. I can share sensor configurations in case you need it.

affrinpinhero-2356 commented 3 years ago

Hi @rkotan Thanks for the update. I will check this scenario and update it here.

kjbracey commented 3 years ago

Besides the change in the target code (to increase the time).

We still shall consider doing similar work as SPI did in the pull request #9469 (cc @kjbracey-arm - would it make sense to do the same peripheral logic for I2c).

Quite possibly, yes - conceptually the I2C is akin to the SPI. The C++ object really represents an endpoint on a shared bus, so to cope with two independent users you need both objects referencing one bus to be using the same mutex. The suggestion in the OP to remove the static on the mutex breaks the single-bus case.

But having only one mutex impairs performance of multiple buses - they become needlessly mutexed against each other.

The format switching as if on a single bus shouldn't be totally breaking multiple bus use though. The buses should be used sequentially, and if the other bus was last used, then the bus will get reconfigured. But that shouldn't be doing anything other than wasting time.

I guess that is just the complaint? Low performance? I'm surprised that any sort of peripheral reconfiguration would be slow compared to actual transfer speed - I2C is so slow!

kjbracey commented 3 years ago

Oh, I note that rkotan mentions asynchronous transfers. That was something that was never properly addressed in the SPI case - asynchronous transfers for SPI basically bypass all the "acquire" and "mutex" logic. If you're using them, then you pretty much have to be on a dedicated bus, and know that no-one else is trying to use it while your async transfer is ongoing. My changes in that PR only tackled the blocking API, making that safe but efficient for all use cases. I didn't have the answer for async transfers, as they were never really safe.

It is possible that some implementation error in I2C's asynchronous transfers is causing a conflict between the two buses. Maybe the asynchronous transfer is messing with owner without claiming the mutex?

affrinpinhero-2356 commented 3 years ago

Hi @rkotan,

Can you confirm you are using ThisThread::sleep_for() to sleep your thread? Can you confirm how the thread mechanism is handled/managed?

affrinpinhero-2356 commented 3 years ago

Can you try this method and update CPU method.

        EventQueue *stats_queue = mbed_event_queue();
        Thread *thread;
        int id;
        id = stats_queue->call_every(SAMPLE_TIME_MS, print_cpu_stats);
        thread = new Thread(osPriorityNormal, MAX_THREAD_STACK);
        thread->start(busy_customer_Method);

        // Steadily increase the system load
        for (int count = 1; ; count++) {
            thread_sleep_for(LOOP_TIME_MS);
            if (wait_time_ms <= 0) {
                break;
            }
            wait_time_ms -= 100;

            printf("\nW_T %d, TRD CND %d\n",wait_time_ms,trd_Cnt);
            printf("CPU USAGE CHECK RW\n");
            printf("\nDev ID \nADXL 0x%x\nMPU 0x%x\n", ADXL_ID[0], MPU_ID[0]);

            // clearArrays();
        }
        thread->terminate();
        stats_queue->cancel(id);
void busy_customer_Method()
{
    sen_adxl.transfer(ADXL_ADDR, &DEVICE_ID_ADXL[0], 1, &ADXL_ID[0], 2, callback_debug);
    sen_mpu.transfer(MPU_ADDR, &DEVICE_ID_MPU[0], 1, &MPU_ID[0], 2, callback_debug);
    ThisThread::sleep_for(1);
}
rkotan commented 3 years ago

Hi @affrinpinhero-2356,

Can you confirm you are using ThisThread::sleep_for() to sleep your thread?

Yep, I use this one: ThisThread::sleep_for(Clock::duration_u32 rel_time)

Can you confirm how the thread mechanism is handled/managed?

I have a class for managing all transfers on each physical I2C bus which contains a mbed::I2C instance, a mbed::Thread instance and some auxiliary stuff. So, only two instances of this class are created and they are completely independent from each other. After sensor configuration and reading settings from EEPROM program starts a thread for each bus. These threads manage only async transfers by calling ThisThread::sleep_for(Clock::duration_u32 rel_time) with suitable delay (10 ms for BMX160). In case if blocking read or write (by using I2C::read or I2C::write) is requested then all async transfers are suspended, but I don't use these operations while reproducing the issue.

Can you try this method and update CPU method.

It will take some time to incorporate your code into my program. I'll post the results in a couple of days.

affrinpinhero-2356 commented 3 years ago

@rkotan Thanks for the comment. Since I wouldn't able to see CPU throttle, it good to share your test code (if possible). So that I can make a similar setup and test from my end. 😃

rkotan commented 3 years ago

Hi @affrinpinhero-2356,

I wrote the simple program, the code is following:

Code ``` #include #include #define TWO_THREADS #define SAMPLE_TIME_MS (10) #define LOOP_TIME_MS (100) #define BMX160_ADDR (0b1101001 << 1) #define LIS3MDL_ADDR (0b0011110 << 1) mbed_stats_cpu_t last_cpu_stats_; float cpu_usage_; I2C i2c_sens_(I2C4_SDA, I2C4_SCL); I2C i2c_erom_(I2C1_SDA, I2C1_SCL); char BMX160_DATA_ADDR = static_cast(BMX160::Regs::DATA_0_MAG); char LIS3MDL_DATA_ADDR = static_cast(LIS3MDL::Regs::CTRL_REG1) | LIS3MDL::INC_REG; char BMX160_SENSOR_DATA[20]; char LIS3MDL_SENSOR_DATA[6]; void print_cpu_stats(void) { mbed_stats_cpu_t cpu_stats; ::mbed_stats_cpu_get(&cpu_stats); const int64_t uptime_diff = cpu_stats.uptime - last_cpu_stats_.uptime; const int64_t idle_diff = cpu_stats.idle_time - last_cpu_stats_.idle_time; cpu_usage_ = 100.0f - static_cast(idle_diff * 100) / static_cast(uptime_diff); last_cpu_stats_ = cpu_stats; utils::DebugPrint::Instance().printf("cpu usage: %f\n\r", cpu_usage_); } void callback_debug(int event) { } void busy_customer_Method_t1() { i2c_sens_.transfer(LIS3MDL_ADDR, &LIS3MDL_DATA_ADDR, 1, &LIS3MDL_SENSOR_DATA[0], 5, mbed::callback(&callback_debug), false); #ifndef TWO_THREADS i2c_erom_.transfer(BMX160_ADDR, &BMX160_DATA_ADDR, 1, &BMX160_SENSOR_DATA[0], 20, mbed::callback(&callback_debug), false); #endif ThisThread::sleep_for(1ms); } void busy_customer_Method_t2() { i2c_erom_.transfer(BMX160_ADDR, &BMX160_DATA_ADDR, 1, &BMX160_SENSOR_DATA[0], 20, mbed::callback(&callback_debug), false); ThisThread::sleep_for(1ms); } void tryHang() { int wait_time_ms = 300; EventQueue *stats_queue = mbed_event_queue(); Thread *thread; Thread *t2; int id; id = stats_queue->call_every(SAMPLE_TIME_MS, mbed::callback(&print_cpu_stats)); thread = new Thread(osPriorityNormal, OS_STACK_SIZE); thread->start(mbed::callback(&busy_customer_Method_t1)); #ifdef TWO_THREADS t2 = new Thread(osPriorityNormal, OS_STACK_SIZE); t2->start(mbed::callback(&busy_customer_Method_t2)); #endif // Steadily increase the system load for (int count = 1; ; count++) { thread_sleep_for(LOOP_TIME_MS); if (wait_time_ms <= 0) { break; } wait_time_ms -= LOOP_TIME_MS; utils::DebugPrint::Instance().printf("\n\rW_T %d, TRD CND %d\n\r", wait_time_ms, 0); utils::DebugPrint::Instance().puts("CPU USAGE CHECK RW\n\r"); utils::DebugPrint::Instance().printf("Dev ID \nBMX 0x%x\nLIS 0x%x\n\r", BMX160_SENSOR_DATA[0], LIS3MDL_SENSOR_DATA[0]); // clearArrays(); } thread->terminate(); #ifdef TWO_THREADS t2->terminate(); #endif stats_queue->cancel(id); } int main() { utils::DebugPrint::Instance().puts("Program start\n\r"); LedActivity::Instance().setState(BlinkType::Initializing); tryHang(); LedActivity::Instance().setState(BlinkType::OK); utils::DebugPrint::Instance().puts("Ready for work\n\r"); while (true) { ThisThread::sleep_for(1000ms); print_cpu_stats(); } } namespace utils { class DebugPrint { public: ~DebugPrint(); DebugPrint(const DebugPrint &other) = delete; DebugPrint(DebugPrint &&other) = delete; DebugPrint& operator=(const DebugPrint &other) = delete; DebugPrint& operator=(DebugPrint &&other) = delete; int puts(const char* string) { return pc_.write((void *)string, strlen(string)); } template int printf(const char *format, Args... args) { int length = sprintf(buffer_, format, args...); pc_.write(buffer_, length + 1); return length; } static DebugPrint& Instance() { static DebugPrint instance; return instance; } private: DebugPrint(); char buffer_[512]; #if (MBED_VERSION >= MBED_ENCODE_VERSION(5,15,0)) BufferedSerial pc_; #else Serial pc_; #endif }; } ```

Without static there is no noticeable cpu load:

Console log ``` Program start cpu usage: 99.457886 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 W_T 200, TRD CND 0 CPU USAGE CHECK RW Dev ID ADXL 0x0 MPU 0x18 cpu usage: 1.255981 cpu usage: 3.741974 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 W_T 100, TRD CND 0 CPU USAGE CHECK RW Dev ID ADXL 0x0 MPU 0x18 cpu usage: 1.564995 cpu usage: 3.109955 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 1.250000 cpu usage: 0.940002 ```

If both owner_ and mutex_ are static then I2C reinitialization takes approximately 70 ms:

Console log ``` Program start cpu usage: 100.000000 cpu usage: 100.000000 cpu usage: 100.000000 cpu usage: 100.000000 cpu usage: 100.000000 cpu usage: 100.000000 cpu usage: 19.680000 cpu usage: 0.940002 cpu usage: 1.260002 W_T 200, TRD CND 0 CPU USAGE CHECK RW Dev ID BMX 0x0 LIS 0x18 cpu usage: 1.255981 cpu usage: 3.762039 cpu usage: 0.940002 cpu usage: 1.260002 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 1.260002 cpu usage: 0.940002 W_T 100, TRD CND 0 CPU USAGE CHECK RW Dev ID BMX 0x0 LIS 0x18 cpu usage: 1.564995 cpu usage: 3.451042 cpu usage: 0.940002 cpu usage: 1.260002 cpu usage: 1.250000 cpu usage: 1.250000 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 0.940002 cpu usage: 0.949997 ```
affrinpinhero-2356 commented 3 years ago

Thanks for the comment. Give you an update as soon as possible.

idea--list commented 3 years ago

Hi @rkotan and @affrinpinhero-2356 I do not have any STM targets and can not try it myself... so i have 2 questions:

  1. Do you think this resource-hungriness of the I2C is a general bug or is it only an STM related thing? (i ask because i also notice my Ambiq Apollo3 struggling in the existence of an I2C object, have no clue however if these are 2 different bugs or the same one)
  2. In case it is a general bug, do you think this solution here is easily portable for other targets or is it too STM specific to get ported easily to other targets?
rkotan commented 3 years ago

Hi, @idea--list

Do you think this resource-hungriness of the I2C is a general bug or is it only an STM related thing

Unfortunately, I also don't have any non-STM target, so my opinion is questionable here. Based on the bug place, I would say it is the common issue which may hit any target with multiple devices on I2C buses at any moment.

do you think this solution here is easily portable for other targets or is it too STM specific to get ported easily to other targets

It depends on how maintainers decide to deal with it. I think it is the architectural issue and for a start an agreement on how to handle several devices on I2C bus(es) should be reached. I have read forum topic and the related issue and it seems to me, for now there is no clear way how to use I2C in this situation. Some people think that I2C instance should be created every time when it is needed (I don't like this though see some reasons), others suggest that it must be created once and shared in some way between whoever needs it and there is no any actual documentation on how to handle this. In addition, @kjbracey-arm mentioned that some methods are thread-safe and some are not... In my opinion removing static solves most of the problems but it breaks compatibility as was mentioned in the comment. So, maybe existing class should be left untouched and new clear and well documented logic is implemented in some I2C_Advanced class. Some day I2C_Advanced will replace old I2C (like BufferedSerial and Serial) or remain separate for advanced use. I think, this is the way :) but it's long and not easy.

kjbracey commented 3 years ago

I've rechecked the I2C code, and realise I've been writing under a misconception - that it worked like SPI and the API was intended to create one object per slave device.

I2C doesn't work like that, because you pass an address to the read/write calls, not the constructor. In SPI the SSEL pin (effectively the address) is passed to the constructor.

That means it is one bus per object, and then in that case, actually having multiple objects per bus is getting into sketchy territory - Mbed OS generally doesn't support that for HAL buses. Which in turn means if drivers do just make their own objects for (pin, pin address) tuples, you could get into a mess from duplicates.

Also consider frequency - this is set per-object, ie per-bus - and that is correct. An I2C bus can only operate at the speed of the slowest device on the bus, unlike SPI where you've got the chip select pins to "mute" other devices, and you can run different devices at different speeds (and formats).

Given that, I think correct behaviour does require that the top-level application create bus objects, configure their frequency, and then pass I2C pointers to drivers/submodules. Drivers should not create their own I2C objects, because they don't know how fast to run the bus if they only know about one slave device. Unless, that is, the driver requires that it be the owner of the entire bus. This might apply to someone doing asynchronous transfers, which we know have no locking mechanism.

And then given that, I actually think removing the static here is logically correct, contrary to my previous statement. There should be exactly one I2C object per bus.

Does this help?

kjbracey commented 3 years ago

Looking yet again, I see that there is shuffling of the frequency setting as you access different objects, and there is the owner suggesting you could have multiple objects for one bus. But that isn't making sense to me. It looks like it was just copied from SPI. You can't be running one I2C bus at different frequencies depending on device, right?

jeromecoutant commented 3 years ago

This one is fixed by #14805?

0xc0170 commented 3 years ago

I believe it is, closing