espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.67k stars 7.42k forks source link

Strange behavior with Micros() and Queue #1160

Closed Leicas closed 6 years ago

Leicas commented 6 years ago

Hardware:

Board: Adafruit Huzzah ESP32 Core Installation/update date: 20/feb/2018 IDE name: Arduino IDE Flash Frequency: 40Mhz Upload Speed: 115200

Description:

Hi, I noticed some strange things happening when using Micros for timing my code. The sketch below show a minimal exemple, if you comment the xQueueSend(queue, &j, 0); line, everything works fine and Micros and Millis show consistent values.

But with the xQueueSend(queue, &j, 0);, I cannot figure what Micros is returning, and it doesn't make sense to me why using a fifo queue would impact timing.

Did I miss something obvious ?

Thanks

Sketch:


unsigned long mesuretemps = 0;
QueueHandle_t queue;
int queueSize = 4096;
unsigned long oldmic = 0;
void setup(){
  Serial.begin(115200);
  mesuretemps = micros();
  queue = xQueueCreate( queueSize, sizeof( int ) ); 
  if(queue == NULL){
    Serial.println("Error creating the queue");
  }
  xTaskCreate(
                    consumerTask,     /* Task function. */
                    "Consumer",       /* String with name of task. */
                    10000,            /* Stack size in words. */
                    NULL,             /* Parameter passed as input of the task */
                    1,                /* Priority of the task. */
                    NULL);            /* Task handle. */
}
void loop(){
  for(int j=0;j<1000;j++) {
    xQueueSend(queue, &j, 0);
  }
 unsigned long ms = millis();
unsigned long us = micros();
unsigned long delta = ms - mesuretemps;
mesuretemps = ms;
unsigned long deltamic = us - oldmic;
oldmic = us;
  Serial.printf("done in %lu ms %lu mus\n", delta,deltamic);
}
void consumerTask( void * parameter)
{
    int element;
  while(true) {
    xQueueReceive(queue, &element, portMAX_DELAY);
    delayMicroseconds(125);
  }
}
stickbreaker commented 6 years ago

@Leicas What is it doing that you don't like? Post your output.

Chuck.

Leicas commented 6 years ago

@stickbreaker yes, no problem, with xQueueSend

done in 3 ms 35793660 mus
done in 2 ms 35793650 mus
done in 2 ms 35793651 mus
done in 2 ms 35793643 mus
done in 3 ms 35793659 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793659 mus
done in 2 ms 35793652 mus
done in 2 ms 35793651 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793659 mus
done in 2 ms 35793652 mus
done in 2 ms 35793651 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus

without

done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 1 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus

Leicas

stickbreaker commented 6 years ago

@Leicas you are feeding a queue as fast as possible, you are draining the queue one element every 125us. So, if the queue has 4095 elements and loop() starts adding 1000 more, each xQueueSend() will take 125us + task switch + queue work: So one execution of loop() will take 1000*(125us + task switch + xQueueSend() + xQueueRead())+ `loop() operations(Serial.Write() being the majority time waster)

Out of curiosity add

uint32_t count = uxQueueMessagesWaiting( QueueHandle_t xQueue ); // Return the number of messages stored in a queue.

and print the result in loop()

Chuck.

Leicas commented 6 years ago

@stickbreaker Yeah, that's some leftover code, I used to have a function that would feed small amount of data to the queue, less than the queue size and at a much lower rate.

The problem is why does "micros()" stop counting microsecond when you use the xQueueSend()? 35793641micros is almost 36second and that doesn't reflect the time spent at all.

with the counter:

done in 4 ms 35794519 mus loop 4096
done in 3 ms 35794520 mus loop 4096
done in 3 ms 35794518 mus loop 4096
done in 3 ms 35794519 mus loop 4096
done in 3 ms 35794519 mus loop 4096
done in 3 ms 35794519 mus loop 4096

But by curiosity I removed the delayMicroseconds(125); inside the task, and the Micros() is making sense again.

stickbreaker commented 6 years ago

@Leicas Well, it looks like delayMicroseconds() is not atomically handling its value.

your 3597419 is (2^32)/120 + actual us

add this to your println:

uint32_t netMic = deltaMic - ( UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);

For some reason the micros() is thinking a rollover has occurred?

Chuck.

stickbreaker commented 6 years ago

@Leicas @me-no-dev I think I figured out what is happening. The micros() function grabs a cycle counter with: from esp32-hal-misc.c

    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );

This cycle count is a PER TASK count. So, depending on how long a task operates, its number changes. If task 1 operates for 1500us then calls micros() which stores the 'current' microsecond tick(1500), then task 2 which has operated for 50us calls micros(), its tick is 50 which is less than 1500 so a ROLLOVER must have occurred!

To solve this each task needs a task local 'lastMicroSecondTick' and 'rolloverOffset'

Chuck.

stickbreaker commented 6 years ago

@Leicas Here is a potential fix: add the __thread storage specifier to micros() in esp32-hal-misc.c)

unsigned long IRAM_ATTR micros()
{
    static __thread unsigned long lccount = 0;
    static __thread unsigned long overflow = 0;
    unsigned long ccount;
    portENTER_CRITICAL_ISR(&microsMux);
    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
    if(ccount < lccount){
        overflow += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
    }
    lccount = ccount;
    portEXIT_CRITICAL_ISR(&microsMux);
    return overflow + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

This might not be the best way, because, it will reserve 8 bytes in every Task's Stack but it is simple! The Pthread API might be a better long term solution:

Pthread API

The ESP-IDF provides the following pthread API to manage thtread local variables:

        pthread_key_create()
        pthread_key_delete()
        pthread_getspecific()
        pthread_setspecific()

Chuck.

stickbreaker commented 6 years ago

@Leicas Well, __thread just caused Core Panics.

Chuck.

stickbreaker commented 6 years ago

@Leicas Had to use pthread_keys: This is what I changed micros() inside of cores\esp32\esp32-hal-misc.c


portMUX_TYPE microsMux = portMUX_INITIALIZER_UNLOCKED;
static pthread_key_t microsStore=NULL; //Thread local storage for rollover of micro() counter

void* microsStoreDelete(void * storage){  // called when a Task exits to release thread local storage
if(storage) free(storage);
}

unsigned long IRAM_ATTR micros(){

if (!microsStore){ // first Time Ever thread local not init'd
  portENTER_CRITICAL_ISR(&microsMux); // don't know if I really need it, but it only happens Once
  pthread_key_create(&microsStore,microsStoreDelete); // create initial holder
  portEXIT_CRITICAL_ISR(&microsMux);
  }

uint32_t *ptr;// [0] is lastCount, [1] is overFlow

ptr = pthread_getspecific(microsStore); // get address of storage

if(ptr == NULL){ // first time in this thread, allocate mem, init it.  
  portENTER_CRITICAL_ISR(&microsMux); // only happens once per thread (task)
  ptr = (uint32_t*)malloc(sizeof(uint32_t)*2);
  pthread_setspecific(microsStore,ptr); // store the pointer to this thread's values
  ptr[0] = 0; // lastCount value
  ptr[1] = 0; // overFlow
  portEXIT_CRITICAL_ISR(&microsMux);
  }  

unsigned long ccount;
portENTER_CRITICAL_ISR(&microsMux);
__asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
if(ccount < ptr[0]){
  ptr[1] += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
  }
ptr[0] = ccount;
portEXIT_CRITICAL_ISR(&microsMux);
return ptr[1] + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

Your original code now produces this:

done in 2 ms 1909 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1907 mus
done in 2 ms 1911 mus
done in 2 ms 1912 mus
done in 2 ms 1900 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 1 ms 1909 mus
done in 2 ms 1919 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1912 mus
done in 2 ms 1906 mus
done in 2 ms 1904 mus
done in 2 ms 1908 mus
done in 2 ms 1909 mus
done in 1 ms 1912 mus
done in 2 ms 1918 mus
done in 2 ms 1907 mus
done in 2 ms 1911 mus
done in 2 ms 1910 mus
done in 2 ms 1911 mus
done in 2 ms 1909 mus
done in 2 ms 1908 mus
done in 2 ms 1910 mus

@Leicas I had to change your consumerTask() to be more polite with other tasks:


void consumerTask( void * parameter){
int element;
uint32_t blockCount=0;
while(true) {
  if(!xQueueReceive(queue, &element, 0)){ // no data avail
    vTaskDelay(1); // let other task use this core
    failCount++;
    }
  else {
    delayMicroseconds(125);
    blockCount++;
    }
  if(blockCount>5000){ // be polite, let lower priority task have 1ms
    vTaskDelay(1); // task switch, wdt
    blockCount=0;
    }
  }
}

else I would get WDT timeouts on Core 0.


Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 0)
Tasks currently running:
CPU 0: Consumer
CPU 1: loopTask

Chuck.

Leicas commented 6 years ago

Thanks a lot for all the work! I'll try this in my full project.

Le jeu. 1 mars 2018 à 20:51, chuck todd notifications@github.com a écrit :

@Leicas https://github.com/leicas Had to use pthread_keys: This is what I changed micros() inside of cores\esp32\esp32-hal-misc.c https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/esp32-hal-misc.c

portMUX_TYPE microsMux = portMUX_INITIALIZER_UNLOCKED;static pthread_key_t microsStore=NULL; //Thread local storage for rollover of micro() counter void microsStoreDelete(void storage){ // called when a Task exits to release thread local storageif(storage) free(storage); } unsigned long IRAM_ATTR micros(){ if (!microsStore){ // first Time Ever thread local not init'd portENTER_CRITICAL_ISR(&microsMux); // don't know if I really need it, but it only happens Once pthread_key_create(&microsStore,microsStoreDelete); // create initial holder portEXIT_CRITICAL_ISR(&microsMux); } uint32_t *ptr;// [0] is lastCount, [1] is overFlow

ptr = pthread_getspecific(microsStore); // get address of storage if(ptr == NULL){ // first time in this thread, allocate mem, init it. portENTER_CRITICAL_ISR(&microsMux); // only happens once per thread (task) ptr = (uint32_t)malloc(sizeof(uint32_t)2); pthread_setspecific(microsStore,ptr); // store the pointer to this thread's values ptr[0] = 0; // lastCount value ptr[1] = 0; // overFlow portEXIT_CRITICAL_ISR(&microsMux); }

unsigned long ccount;portENTER_CRITICAL_ISR(&microsMux);asm volatile ( "rsr %0, ccount" : "=a" (ccount) );

if(ccount < ptr[0]){ ptr[1] += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ; } ptr[0] = ccount;portEXIT_CRITICAL_ISR(&microsMux);return ptr[1] + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ); }

Your original code now produces this:

done in 2 ms 1909 mus done in 2 ms 1909 mus done in 2 ms 1910 mus done in 2 ms 1907 mus done in 2 ms 1911 mus done in 2 ms 1912 mus done in 2 ms 1900 mus done in 2 ms 1910 mus done in 2 ms 1910 mus done in 1 ms 1909 mus done in 2 ms 1919 mus done in 2 ms 1909 mus done in 2 ms 1910 mus done in 2 ms 1909 mus done in 2 ms 1910 mus done in 2 ms 1912 mus done in 2 ms 1906 mus done in 2 ms 1904 mus done in 2 ms 1908 mus done in 2 ms 1909 mus done in 1 ms 1912 mus done in 2 ms 1918 mus done in 2 ms 1907 mus done in 2 ms 1911 mus done in 2 ms 1910 mus done in 2 ms 1911 mus done in 2 ms 1909 mus done in 2 ms 1908 mus done in 2 ms 1910 mus

@Leicas https://github.com/leicas I had to change your consumerTask() to be more polite with other tasks:

void consumerTask( void * parameter){int element;uint32_t blockCount=0;while(true) { if(!xQueueReceive(queue, &element, 0)){ // no data avail vTaskDelay(1); // let other task use this core failCount++; } else { delayMicroseconds(125); blockCount++; } if(blockCount>5000){ // be polite, let lower priority task have 1ms vTaskDelay(1); // task switch, wdt blockCount=0; } } }

else I would get WDT timeouts on Core 0.

Task watchdog got triggered. The following tasks did not reset the watchdog in time:

  • IDLE (CPU 0) Tasks currently running: CPU 0: Consumer CPU 1: loopTask

Chuck.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/espressif/arduino-esp32/issues/1160#issuecomment-369793767, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuwQ6WZdA5UG6lHb5M4uCTANla-S1Auks5taKW4gaJpZM4SYed- .

stickbreaker commented 6 years ago

@Leicas with more playing, it looks like 1ms is not enough time for the idle process to complete it's housekeeping, I kept getting 'out of stackspace' errors until I increased the vTaskDelay() to 10ms.

I don't know what the actual rules, recommendations are for sharing time with idle tasks. Just be aware if your loops are to tight you may need to design some vTaskDelay()'s into them.

Chuck.

Leicas commented 6 years ago

@stickbreaker Everything is working with your last code changes. Thank you a lot for your involvement !

Kasperdelasopa commented 6 years ago

Hi together,

first, thanks for your work.

I still have problems with the delayMicroseconds() function. I ported my code from an arduino uno to the esp32. On the Arduino i have implemented a function which played a melody by creating the sound frequencies with the delayMicroseconds() according to this tutorial: https://www.arduino.cc/en/Tutorial/PlayMelody

This function worked fine for me on the Arduino.

But it looks to me that on the ESP using arduino-esp32 library the function to delay either block in total or it takes much longer than the expected microseconds to return.

FYI: im using PlatformIO for both projects (Arduino & ESP32) and i updated the arduino-esp32 lib there ti this branch.

Because i don't think it comes from this lib, because i cant be the first having this issue, i think i have setup my project wrong.

Does someone has an idea, where this behavior comes from?

stickbreaker commented 6 years ago

The Esp32 is a multi tasking multi processor system. A call to ~delay()~ tells the OS to switch to another task and come back to this task later. You are going to have to change your program to us a Timer Interrupt.

Chuck.

Kasperdelasopa commented 6 years ago

Thx for your reply,

That makes totally sense to me. I was able to solve it for me by using ledcwrite().

Thanks a lot.