boarchuz / HULP

ESP32 ULP Coprocessor Helper
MIT License
180 stars 18 forks source link

I2C bitbang CRC third byte does not match #23

Closed dizcza closed 1 year ago

dizcza commented 1 year ago

Great job implementing the I2C bitbang mode in ULP! Your repo deserves more stars.

I'm porting the C code of reading the Sensirion SDP8xx digital sensor to HULP I2C bitbang. I'm reading the sensor periodically in continuous mode. C code is proven to work right, and it's here https://github.com/dizcza/sdpsensor-esp-arduino/blob/master/sdpsensor.cpp

The issue happens with the third byte which is CRC according to the documentation. The read value does not match with the expected (only in ULP mode, works fine in plain C).


#include <esp_log.h>
#include <esp_sleep.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <hulp.h>
#include <hulp_i2cbb.h>

#include "sdpsensor.h"

static const char *TAG = "MAIN";

static const uint8_t SLAVE_ADDR = 0x25;
static const gpio_num_t SDA_PIN = GPIO_NUM_33;
static const gpio_num_t SCL_PIN = GPIO_NUM_32;

static RTC_SLOW_ATTR ulp_var_t ulp_read_cmd[HULP_I2C_CMD_BUF_SIZE(3)] = {
        HULP_I2C_CMD_HDR_NO_PTR(SLAVE_ADDR, 3),
};

void init_ulp() {
    enum {
        LABEL_I2C_READ,
        LABEL_I2C_READ_CONTINUOUS,
        LABEL_I2C_READ_RETURN,
        LABEL_I2C_WRITE,
        LABEL_I2C_ERROR,
    };

    const ulp_insn_t program[] = {
            M_LABEL(LABEL_I2C_READ_CONTINUOUS),
            M_DELAY_US_100_5000(1000),
            I_MOVO(R1, ulp_read_cmd),
            M_MOVL(R3, LABEL_I2C_READ_RETURN),
            M_BX(LABEL_I2C_READ),
            M_LABEL(LABEL_I2C_READ_RETURN),
            M_BGE(LABEL_I2C_ERROR, 1),
            M_BX(LABEL_I2C_READ_CONTINUOUS),

            M_LABEL(LABEL_I2C_ERROR),
            I_WAKE(),
            I_END(),                // end ulp program so it won't run again
            I_HALT(),

            M_INCLUDE_I2CBB_CMD(LABEL_I2C_READ, LABEL_I2C_WRITE, SCL_PIN, SDA_PIN)
    };

    ESP_ERROR_CHECK(hulp_configure_pin(SCL_PIN, RTC_GPIO_MODE_INPUT_ONLY, GPIO_FLOATING, 0));
    ESP_ERROR_CHECK(hulp_configure_pin(SDA_PIN, RTC_GPIO_MODE_INPUT_ONLY, GPIO_FLOATING, 0));

    hulp_peripherals_on();

    vTaskDelay(1000 / portTICK_PERIOD_MS);

    ESP_ERROR_CHECK(hulp_ulp_load(program, sizeof(program), 10ULL * 1000, 0));
    ESP_ERROR_CHECK(hulp_ulp_run(0));
}

void print_result() {
    const int16_t dp = (int16_t) ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET].val;
    const uint8_t crc = ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET + 1].val_bytes[0];
    ESP_LOGI(TAG, "Diff pressure %3d CRC %3d expected %3d", dp, crc,  compute_crc((uint8_t*) &dp));
}

void app_main(void) {

    SDPSensor_I2C_GPIO i2c_gpio = {
            .addr = SLAVE_ADDR,
            .port = 0,
            .sda = SDA_PIN,
            .scl = SCL_PIN
    };

    // Init the sensor & start continuous mode
    SDPSensor_Init(&i2c_gpio, NULL);

    init_ulp();

    while (1) {
        vTaskDelay(pdMS_TO_TICKS(100));
        print_result();
    }
}

Logs

I (2943) MAIN: Diff pressure  -8 CRC   0 expected   2
I (3043) MAIN: Diff pressure   7 CRC   0 expected  47
I (3143) MAIN: Diff pressure   3 CRC   0 expected 172
I (3243) MAIN: Diff pressure   9 CRC   0 expected  66
I (3343) MAIN: Diff pressure   3 CRC   0 expected 172
I (3443) MAIN: Diff pressure   2 CRC   0 expected  88
I (3543) MAIN: Diff pressure 10751 CRC   0 expected  14   // 10751 looks fishy, need to investigate; probably due to unsynchonous read-write operations of the same memory block
I (3643) MAIN: Diff pressure   4 CRC   0 expected   2
I (3743) MAIN: Diff pressure  -5 CRC   0 expected  47
I (3843) MAIN: Diff pressure   0 CRC   0 expected 129
I (3943) MAIN: Diff pressure  -4 CRC   0 expected 129
I (4043) MAIN: Diff pressure  12 CRC   0 expected  53
I (4143) MAIN: Diff pressure   1 CRC   0 expected 117
I (4243) MAIN: Diff pressure  -3 CRC   0 expected 117
I (4343) MAIN: Diff pressure   1 CRC   0 expected 117
I (4443) MAIN: Diff pressure   4 CRC   0 expected   2

Would much appreciate if you provide any hint why this might happen.

I have several other questions but let's deal with this one first.

boarchuz commented 1 year ago

Thanks @dizcza

const uint8_t crc = ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET + 1].val_bytes[0];

Could you please check if the CRC is at val_bytes[1]? If that's it I might change ulp_var_t (eg. add val_byte1 and val_byte0 in 'reverse' order) as it is currently counter-intuitive.

And maybe something like this?

#define HULP_I2C_CMD_DATA_BYTE(cmd, index) ((cmd)[HULP_I2C_CMD_DATA_OFFSET + (index) / 2].val_bytes[((index) % 2 ? 0 : 1)])
// eg. get 3rd byte from ulp_read_cmd
const uint8_t crc = HULP_I2C_CMD_DATA_BYTE(ulp_read_cmd, 3);

Otherwise, are you able to capture the transaction with an oscilloscope or logic analyser?

dizcza commented 1 year ago

You're right, the [1] byte is actually the CRC. Not only that, (due to little endian and hence the reversed byte order?) I needed to explicitly create the byte array for CRC instead of taking the address of a 16-bit value. Here is the sketch that works:

    const int16_t dp = (int16_t) ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET].val;
    const uint8_t crc = ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET + 1].val_bytes[1];
    uint8_t dp_reversed[2] = { dp >> 8, dp & 0xFF };
    ESP_LOGI(TAG, "Diff pressure[%3d] %3d | CRC %3d expected %3d", ulp_dp_idx.val, dp, crc,  compute_crc((uint8_t*)dp_reversed));
dizcza commented 1 year ago

Also, when I change ULP_COPROC_RESERVE_MEM to its maximum value (8176), HULP sources fail to build:

/home/dizcza/.espressif/tools/xtensa-esp32-elf/esp-2022r1-11.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/../../../../xtensa-esp32-elf/bin/ld: ulp_hulp.elf section `.rtc.force_slow' will not fit in region `rtc_slow_seg'
/home/dizcza/.espressif/tools/xtensa-esp32-elf/esp-2022r1-11.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/../../../../xtensa-esp32-elf/bin/ld: RTC_SLOW segment data does not fit.
/home/dizcza/.espressif/tools/xtensa-esp32-elf/esp-2022r1-11.2.0/xtensa-esp32-elf/bin/../lib/gcc/xtensa-esp32-elf/11.2.0/../../../../xtensa-esp32-elf/bin/ld: region `rtc_slow_seg' overflowed by 2820 bytes

So it's only 5352 bytes are available. I'll be doing really fast sensor readings with ULP I2C - 1k or 2k Hz, and I'm writing the results to array and poll the values in a thread. Is there a way to cut unnecessary files in HULP if I need I2C bitbang only?

dizcza commented 1 year ago

define HULP_I2C_CMD_DATA_BYTE(cmd, index) ((cmd)[HULP_I2C_CMD_DATA_OFFSET + (index) / 2].val_bytes[((index) % 2 ? 0 : 1)])

// eg. get 3rd byte from ulp_read_cmd const uint8_t crc = HULP_I2C_CMD_DATA_BYTE(ulp_read_cmd, 3);

That would help young players to get around the issue I had, yes. Maybe val_bytes[1 - ((index) % 2)] looks better though.

dizcza commented 1 year ago

A few other questions I have (sorry for overwhelming, I don't have a solid background in assembly instructions and ULP):

boarchuz commented 1 year ago

Also, when I change ULP_COPROC_RESERVE_MEM to its maximum value (8176) ... only 5352 bytes are available

Bear in mind that you have 8KB of RTC slow memory total, shared between ULP_COPROC_RESERVE_MEM and and variables placed there (eg. with RTC_SLOW_ATTR, as well as RTC_DATA_ATTR, etc depending on config).

Each ulp_var_t is 4 bytes. So, for example, if you had: RTC_SLOW_ATTR ulp_var_t my_ulp_array[1000]; Then the maximum possible ULP_COPROC_RESERVE_MEM is reduced by 4*1000 bytes.

Using HULP won't change this in any way. You can look in build/[project_name].map around '.rtc_force_slow' for anything unexpected.

  • There is no atomic kind of support in ULP, right?

There are no special instructions or hardware support, but take a look at hulp_mutex.h for simple mutual exclusion.

  • Is there a way to notify a task with a value from ULP save for calling rtc_isr_register and providing an ISR function in C The reason I'm asking is that I'd like to know the index (or the counter) currently being filled in an array buffer in ULP from C code (and I need to avoid race condition with ULP). For that, I don't want to disturb CPU each time the counter is incremented in ULP.

You could easily handle that logic with the ULP, if I'm understanding correctly. Just check if the current counter has met some threshold to determine if an interrupt should be fired. The ULP write and SoC read should be wrapped in a mutex.

  • If never want ULP to sleep and wake up, will the default loading of code work

Yep, nothing special needs to be done there. As long as the ULP doesn't execute an I_HALT, it will keep running.

dizcza commented 1 year ago

Thanks for the detailed answer.

Bear in mind that you have 8KB of RTC slow memory total, shared between ULP_COPROC_RESERVE_MEM and and variables placed there (eg. with RTC_SLOW_ATTR, as well as RTC_DATA_ATTR, etc depending on config).

I see, so I got it wrong: I thought this is something the user needs to increase in order to declare a static variable with a large size with RTC_SLOW_ATTR attribute. But it's the other way around: you need to decrease this value in order to have larger space for user-defined variables, constrained to fit the size of the program, of course (it's only 99 words in my case).

There are no special instructions or hardware support, but take a look at hulp_mutex.h for simple mutual exclusion.

An interesting choice, and an example would help. How long does taking and releasing a mutex take place in ULP? And more generally, how to measure the duration of a block of assembly instructions in ULP?

dizcza commented 1 year ago

What I also don't understand in HULP is where the clock frequency is set in I2C? I imagine there is no such thing as a single variable in bitbanging mode that could correspond to I2C clock freq, I'm just curious how to increase or decrease the speed? In particular, does the default config correspond to ~100 kHz or 400 kHz?

boarchuz commented 1 year ago

And more generally, how to measure the duration of a block of assembly instructions in ULP?

The instruction set lists the number of cycles for each instruction: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/ulp_instruction_set.html If you're using HULP abstractions (eg. M_GPIO_TOGGLE) you'll obviously need to expand the macro to find which instructions are used.

An interesting choice, and an example would help. How long does taking and releasing a mutex take place in ULP?

Taking a quick look, as it stands (it could be slightly optimised), best case scenario would be 32 cycles (~200us) to take and 8 (~50us) to release. Ignore that, I had a brainfart, ULP runs on 8MHz clock, not 150kHz... ... best case scenario would actually be 32 cycles (~4us) to take and 8 (~1us) to release.

Note that almost every case will likely be best case, given how infrequently and how briefly the SoC usually needs the mutex.

Sorry, I know documentation/examples are lacking. Here's a simple, untested mutex example:

RTC_SLOW_ATTR ulp_mutex_t ulp_mutex;
RTC_SLOW_ATTR ulp_var_t ulp_counter;
RTC_SLOW_ATTR ulp_var_t ulp_counter_is_odd;

ULP:
// Requires a non-zero to take, and a zero to release.
I_MOVI(R1, 1),
I_MOVI(R2, 0),
// R3 will be the counter register
I_MOVI(R3, 0),

// Loop:
M_LABEL(1),
    I_ADDI(R3, R3, 1),
// Take mutex (note: will clobber R0)
    M_MUTEX_TAKE(ulp_mutex, R1, R1),
        I_PUT(R3, R2, ulp_counter),
        I_ANDI(R0, R3, 1),
        I_PUT(R0, R2, ulp_counter_is_odd),
// Release mutex
    M_MUTEX_RELEASE(ulp_mutex, R2),
    M_BX(1),

SoC:
for(;;)  {
    uint16_t counter_val;
    bool counter_is_odd;

    HULP_MUTEX_SOC_TAKE_WHILE(&ulp_mutex) {
        vTaskDelay(1);
    }
    counter_val = ulp_counter.val;
    counter_is_odd = ulp_counter_is_odd.val;
    HULP_MUTEX_SOC_RELEASE(&ulp_mutex);

    printf("Counter: %u, Odd: %d\n", counter_val, counter_is_odd);
    vTaskDelay(1000 / portTICK_PERIOD_MS);
}

What I also don't understand in HULP is where the clock frequency is set in I2C?

It's fixed, approximately 150kHz if I remember correctly. ULP is relatively slow of course, so the values involved in each transaction can change the timings slightly even for each bit as different branches are executed. I've been conscious of that and tried to keep timings as consistent as reasonably possible, but the main optimisation focus was flexibility with minimal size.

dizcza commented 1 year ago

The instruction set lists the number of cycles for each instruction: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/ulp_instruction_set.html If you're using HULP abstractions (eg. M_GPIO_TOGGLE) you'll obviously need to expand the macro to find which instructions are used.

A bit of math involved... But it won't cover all cases. In my example, I'd like to measure the duration of I2C bitbang command, and expanding the M_INCLUDE_I2CBB_CMD will take too much effort. Is there a way to save the RTC's tick into a ulp_variable and then take the diff?

Thanks for the example, it makes sense. Though I'll be probably using it differently:

    hulp_mutex_soc_request(_mutex);
    if (hulp_mutex_soc_check(_mutex)) {
        // success; do stuff with the _mutex
    }
    hulp_mutex_soc_release(_mutex);
    // do other stuff
    vTaskDelay(...)

<10 us timing LGTM.

It's fixed, approximately 150kHz if I remember correctly. ULP is relatively slow of course, so the values involved in each transaction can change the timings slightly even for each bit as different branches are executed. I've been conscious of that and tried to keep timings as consistent as reasonably possible, but the main optimisation focus was flexibility with minimal size.

This will be the bottleneck for me if I try polling the sensor at its maximum speed - 2000 Hz (500 us). SoC with I2C clock 100 kHz takes ~445 us to read 3 bytes at CPU default freq 160 MHz and literally leaves no time for other instructions. While 400 kHz I2C mode yields only ~150 us. I don't want to get my hands dirty, but if I had such a will, what and where would I need to change to increase the effective "I2C clock freq" in the bitbang mode?


As a side note, your code fails to build with the latest ESP-IDF master branch I'm currently on (I often submit issues there which require me to use their "bleeding edge" software) - it's due ADC has undergone huge changes in v5.x release. And it seems the official release of v5.x is on the horizon. Would you support it? I'd like to have HULP as a submodule in components directory rather than copy-pasting the scripts of forking the repo with ADC files removed.

dizcza commented 1 year ago

Is there a way to save the RTC's tick into a ulp_variable and then take the diff?

I think I know a possible answer to my question. At least, we can measure the duration of the whole program empirically by running it say 1000 times and calling esp_timer_get_time() before and after the execution in the main loop. It's not as convenient as inserting a macro but provides a feasible way to get a feeling of the elapsed time.

~And it takes 2750 us to read 3 bytes :( Ah, such a pity. It's not a fit even for 1000 Hz sampling.~ Sorry, now it's me who's having a brainfart. I've accidentally increased the number of reading bytes to 30, dear me. The correct duration to read 3 bytes is 390 us! Nice.

boarchuz commented 1 year ago

I'd like to measure the duration of I2C bitbang command, and expanding the M_INCLUDE_I2CBB_CMD will take too much effort. Is there a way to save the RTC's tick into a ulp_variable and then take the diff?

It's possible:

M_UPDATE_TICKS(),
I_RD_TICKS_REG(0),
I_PUT(R0, Rx, ulp_start_tick),
// Do work
I_GET(R1, Rx, ulp_start_tick),
M_UPDATE_TICKS(),
I_RD_TICKS_REG(0),
I_SUBR(R0, R1, R0),
I_PUT(R0, Rx, ulp_ticks_duration),

But there are probably better ways. The most obvious and accurate would be to use an oscilloscope. From memory, the same command should always follow the same path, so should always take the same amount of time.

if I had such a will, what and where would I need to change to increase the effective "I2C clock freq" in the bitbang mode?

I don't think it would be a good idea. It is designed to be able to drop anywhere into the program so all of the branching is relative, there's a lot of jumping around to save memory, every single bit written involves a branch depending if 0/1, some extra work is required to keep track of state, etc. It can be very confusing.

If I was in your shoes, with performance being the top goal, I would start fresh. Memory permitting, I would hardcode every command, unroll as many loops as possible, minimise branching, etc. Maybe you can ignore the CRC.

If you really wanted to get your hands dirty, you could try writing a ULP driver for the RTC I2C (ie. by writing to its registers, not using ULP instructions). That's getting very technical though!

As a side note, your code fails to build with the latest ESP-IDF master branch I'm currently on

Can you tell me the commit id?

dizcza commented 1 year ago

ESP-IDF commit dcaa753f37bd90c2ec40d24940bc1c132d9532d9

EDIT: It's not the build that gets failed, sorry. The build is successful but you cannot launch the image (which is even worse... what ESP guys were thinking of?). It immediately throws the error

E (1170) ADC: CONFLICT! driver_ng is not allowed to be used with the legacy driver

I don't have a scope to check I2C bitbang behavior. But I'll surely do it next time I'll visit the lab. I'm wondering if I need to put any pullups on this virtual I2C bus as I normally do with a real one. Guess so, though at a speed of ~150 kHz this might be not so crucial. I'm also interested to see if there is any time stretching, i.e. if the pulse period of both SDA and SCL is constant.

If you really wanted to get your hands dirty, you could try writing a ULP driver for the RTC I2C (ie. by writing to its registers, not using ULP instructions). That's getting very technical though!

Yeah, it's not worth a try. Besides, 390 us is something I'm satisfied with for now.

I did have a look at the ULP section in the ESP technical reference manual before I started using your lib. I was looking for reading multiple bytes on an I2C bus and found none; it stroke me as there are really not many IoT sensors with a single byte readout. I also peeked into sections "30.6.2.3 Detecting Error Conditions" and "Register 30.10. RTC_I2C_DEBUG_STATUS_REG (0x008)" and found similar notations in the MultiRead bitbang example. But I mean RTC_I2C_DEBUG_STATUS_REG is designed for hardware control, and bitbanging is a software implementation.

Anyway, your flags - LBL_I2C_ARBLOST and LBL_I2C_NACK - which I found in the Single read example are not available for use with multiple bytes readout as I posted in the first message, are they? Talking about macros M_INCLUDE_I2CBB VS M_INCLUDE_I2CBB_CMD.

boarchuz commented 1 year ago

Thanks, I'll take a look at the error when I have a chance.

I'm wondering if I need to put any pullups on this virtual I2C bus as I normally do with a real one. Guess so, though at a speed of ~150 kHz this might be not so crucial. I'm also interested to see if there is any time stretching, i.e. if the pulse period of both SDA and SCL is constant.

You will definitely still need pullups. It makes no difference what is driving the pins, I2C is an open-drain bus (pins driven low for 0, but left to float for 1) so pullups are necessary for 1s, at any frequency.

I'm also interested to see if there is any time stretching

I understand "clock stretching" to be when the slave holds the line low to signal to the master that it's busy. That's not supported. The pulse period is not constant for the reasons in my last comment, but it's reasonably good. I2C has very loose requirements, anyway.

LBL_I2C_ARBLOST and LBL_I2C_NACK - which I found in the Single read example are not available for use with multiple bytes readout as I posted in the first message, are they?

It's a bit different - it won't jump to an error handler, instead you should use R0 or the ALU to handle an error upon return. See: https://github.com/boarchuz/HULP/blob/3cd0ef8b8f8b92c502e0b2b5ccc91059905e7e45/src/hulp_i2cbb.h#L357-L360 eg. M_BGE(LBL_MY_ERROR_HANDLER, 1) or M_BXF(LBL_MY_ERROR_HANDLER)

dizcza commented 1 year ago

You've got me, I'm misusing the terminology. Clock stretching popped up in my mind but it has a completely different meaning as you just mentioned.

I've overlooked this part of the documentation, thanks for pointing this out. BTW, I'm delighted to see well-written docs as typically people don't bother with that and just toss the code to the users.

You've answered all my questions at least for today :) Thanks again, this is really helpful. Today I learned about assembly more than perhaps in my whole life :) An interesting endeavor, got hooked on it. I could as well buy S3 with RISCV user-friendly C code instructions, but I love "squeezing cheese out of shit", so to say.

I'm closing the issue and will come back if I have further questions.

dizcza commented 1 year ago

I'm really enjoying mutexes API. I'd change M_MUTEX_TAKE(ulp_mutex, R1, R1) to M_MUTEX_TAKE(ulp_mutex, R1, 1) cause it's confusing otherwise. Other than that, it works fine.

A short question: any reason you've put a delay of 1 sec to all I2C bitbang examples prior to loading the ULP program? Is it safe to remove the delay?

boarchuz commented 1 year ago

Yes you should make that change, that's a typo sorry!

There would be no need for the delay usually, so you can safely remove it. It was probably something I stuck there to be conservative with the ULP/bus initialisation one day, and it got copied to all of the others too.

dizcza commented 1 year ago

One thing I noticed while playing with ULP is that for a reason I don't understand (could be a bug on the ESP side) if the program size is slightly larger than ULP_COPROC_RESERVE_MEM, the program seems to be loaded successfully (guarded by ESP_ERROR_CHECK) but at runtime it immediately throws error code 9. This is the behavior I'm getting with program size 516 bytes and ULP_COPROC_RESERVE_MEM set to default 512. Once I changed this value to 516, everything works fine. So I added

if (sizeof(program) > CONFIG_ULP_COPROC_RESERVE_MEM) {
        // Warn the user but still try on
        ESP_LOGE(TAG, "ULP program size (%d) does not fit into ULP_COPROC_RESERVE_MEM (%d)", sizeof(program), CONFIG_ULP_COPROC_RESERVE_MEM);
    }
dizcza commented 1 year ago

Are you planning to include support for S3? I'd like to make sure the code is running on both chips fine.

boarchuz commented 1 year ago

No plans for anything other than ESP32. Most people will be using RISCV ULP on newer targets and the FSM instruction set is different enough that I can't really justify the complexity. (And, personally, I only use ESP32.)

dizcza commented 1 year ago

FSM instruction set is different enough

Ah, didn't know that. Strange. All right, got it.

Anyway, it works fine on ESP32. The maximum reliable speed of polling my I2C sensor with bitbanging is slightly suboptimal - 1600 Hz instead of 1900 Hz (which is achieved with real I2C hardware) - even with 3k pullups, but at the first glance, the picture on an oscilloscope looks good. Need to investigate further if that becomes a limitation.

dizcza commented 1 year ago

I'm having a headache from digesting in which region of RTC memory should I put my variables. These variables should preserve value upon waking from a light sleep.

// Accessed only within ULP
static RTC_SLOW_ATTR ulp_var_t ulp_read_cmd[HULP_I2C_CMD_BUF_SIZE(2)] = {
        HULP_I2C_CMD_HDR_NO_PTR(SLAVE_ADDR, 2),
};

// data buffer filled in ULP and read in SoC
static RTC_DATA_ATTR ulp_var_t ulp_dp_raw[SDP_ULP_BUFF_SIZE] = { 0 };

// data index stored in ULP and read in SoC
static RTC_DATA_ATTR ulp_var_t ulp_dp_idx = { 0 };

// error placeholder
static RTC_DATA_ATTR ulp_var_t ulp_error = { 0 };

// accessed both by ULP and SoC
static RTC_DATA_ATTR ulp_mutex_t ulp_mutex = { 0 };
...

            // Write I2C read value to ulp_dp_raw[ulp_dp_idx]
            I_MOVI(R0, 0),
            I_GET(R0, R0, ulp_dp_idx),
            I_MOVI(R1, 0),
            I_GET(R1, R1, ulp_read_cmd[HULP_I2C_CMD_DATA_OFFSET]),
            I_ST(R1, R0, (uint16_t)((uint32_t*) ulp_dp_raw - RTC_SLOW_MEM)),

Citing corresponding pieces of info from docs:

If some variables in the program are placed into RTC slow memory (for example, using RTC_DATA_ATTR attribute), RTC slow memory will be kept powered on by default.

Global and static variables used by code which runs from RTC memory must be placed into RTC Slow memory. For example deep sleep variables can be placed here instead of RTC FAST memory, or code and variables accessed by the ULP Coprocessor programming.

And I'm confused. "Deep sleep variables can be placed here (RTC SLOW) instead of RTC FAST memory" suggests that they should be placed in the RTC fast memory by default, otherwise the correct wording would be "Deep sleep variables must be placed here". But the first citation contradicts the conclusion I'm drawing. Therefore, my thinking is that a variable accessed in ULP (in a deep sleep or just running) and in SoC (after wakeup) can be placed in one of three regions: RTC_DATA_ATTR, RTC_SLOW_ATTR, or RTC_FAST_ATTR, but I'm not sure about that.

  1. Am I doing correctly by putting ulp_read_cmd to RTC_SLOW_ATTR and everything else to RTC_DATA_ATTR?

  2. If I use a light sleep only (and never deep sleep), can I access non-RTC memory from a ULP in a light sleep? I'm asking because, as you've said, RTC slow + fast mem is limited to 8 Kb.

  3. It looks that RTC_FAST_ATTR is shared with neither RTC slow nor RTC data regions. Can I place my ULP variables here?

Would much appreciate if you shed light on this. Thanks.

boarchuz commented 1 year ago

The ULP can only access RTC Slow Memory. For that reason, I think it's best to be explicit and use RTC_SLOW_ATTR for anything and everything the ULP will be using.

RTC Fast Memory is very useful if you're running low on Slow, but can only be accessed by CPU0 so you need to take care using it.

RTC_DATA_ATTR should be avoided here, in my opinion. It will almost always default to Slow but is sometimes Fast, depending on configuration. Only use it if you're sure you don't need to care which segment is used, or need to support targets which may not have one or the other segment. Just to be extra clear: There are only two RTC memory segments - Fast and Slow; anything with the RTC_DATA_ATTR attribute will be placed in one of these, depending on config.

So to answer your questions directly:

  1. Yes, because RTC_DATA_ATTR == RTC_SLOW_ATTR for your config. I would suggest using RTC_SLOW_ATTR to be "more correct".

2 & 3. No, the ULP can only access RTC Slow Mem.