SpenceKonde / megaTinyCore

Arduino core for the tinyAVR 0/1/2-series - Ones's digit 2,4,5,7 (pincount, 8,14,20,24), tens digit 0, 1, or 2 (featureset), preceded by flash in kb. Library maintainers: porting help available!
Other
558 stars 146 forks source link

Serial.flush() hangs at 115200 baud and 1MHz clock #352

Closed jvasileff closed 3 years ago

jvasileff commented 3 years ago

The program below reliably hangs when running the clock at 1MHz. The problem does not occur if the clock is increased to 4MHz or the baud rate is lowered to 57600. I tested this on two ATtiny814 chips.

The hang happens at the same iteration count & millis() very consistently for the same code. If the code is slightly altered, the hang still occurs, but at some other count & millis().

The high baud rate and low frequency does not seem to be a problem without flush().

#include <arduino.h>

void setup() {
    Serial.begin(115200);
    Serial.println("Starting up");
    Serial.flush();
}

void loop() {
    static unsigned long loopCount = 0;

    Serial.print(loopCount++);
    Serial.print(" @ ");
    Serial.println(millis());

    Serial.flush(); // <<---------- eventually hangs here w/1MHz clock
}

The output ends with:

...
415 @ 2239
416 @ 2246
417 @ 2252
418 @ 2258
419 @ 2264
420 @ 2270
421 @ 2276
SpenceKonde commented 3 years ago

Well that's an interesting bug! Wonder where it gets hung up... Easy enough to find out, as long as sticking a SBI pointed at a pin connected to an LED into the ISRs don't make the bug go away. When I took a flame thrower to the USART code a while back I definitely found code that could hang up, but I must have missed some. (the old implementation was on of those times where you keep studying code you don't understand..... and then finally you figure out what they were doing, and when you recover from the convulsions that result from that and the knowledge that you're responsible for a package including that code, you immediately drop what you were doing and rewrite the whole damned thing.)

The interrupts for feeding data to the USART peripherals are still pretty horrifying if you look at the assembly listing - grotesquely bulky, though I don't know of a better way to do it. If we could align the buffer in memory, we could achieve some epic optimizations >.> I suspect at 1 MHz and 115200 it takes longer to load in the next byte than it does to output it...

I wager there's a race condition I hadn't seen that happens when the times of the USART bit clock and the system clock line up just the wrong way at just the wrong point in the code.

SpenceKonde commented 3 years ago

This is a critical bug, but it's not 2.2.x critical

SpenceKonde commented 3 years ago

Interesting - does not reproduce in DxCore... which I thought had an identical serial implementation!

SpenceKonde commented 3 years ago
#ifdef __AVR_ATtinyxy2__
  #define ERSATZ_RESET_PIN PIN_PA3
  #define RESET_MASK 1<<3
  #define PINCTRL PORTA.PIN3CTRL
#else
  #define ERSATZ_RESET_PIN PIN_PA4
  // Always arduino pin 0
  #define RESET_MASK 1<<4
  #define PINCTRL PORTA.PIN4CTRL
#endif

#include <arduino.h>
void setup() {
    Serial.begin(115200);
    Serial.println("Starting up");
    Serial.flush();
    pinMode(LED_BUILTIN,OUTPUT); /*
  pinMode(ERSATZ_RESET_PIN, INPUT_PULLUP); //on 8-pin parts, this overrides the previous line! No LED blink on these; not enough pins to demo everything here...
  while ((!(VPORTA.IN & RESET_MASK)) && millis() < 10); //wait up to 10 milliseconds for pin to go high...
  // MUCH longer than we need to wait...
  if (!(VPORTA.IN & RESET_MASK)) { //if still not high, reset...
    _PROTECTED_WRITE(RSTCTRL.SWRR, 1);
  } else {
    PINCTRL = PORT_PULLUPEN_bm | PORT_ISC_LEVEL_gc;
  }
  */
}

void loop() {
    static unsigned long loopCount = 0;

    Serial.print(loopCount++);
    Serial.print(" @ ");
    Serial.println(millis());
    VPORTA.IN|=0x80;
    Serial.flush(); // <<---------- eventually hangs here w/1MHz clock
}

ISR(PORTA_PORT_vect) {
  byte flags = PORTA.INTFLAGS;
  PORTA.INTFLAGS = flags; //clear flags
  if (flags & RESET_MASK) {
    _PROTECTED_WRITE(RSTCTRL.SWRR, 1);
  }
}

That always stops at loop 506 for me....

This usually stops at loop 29, but sometimes stops at a different value; this alternate crash loop number is highly biased towards certain numbers.

#ifdef __AVR_ATtinyxy2__
  #define ERSATZ_RESET_PIN PIN_PA3
  #define RESET_MASK 1<<3
  #define PINCTRL PORTA.PIN3CTRL
#else
  #define ERSATZ_RESET_PIN PIN_PA4
  // Always arduino pin 0
  #define RESET_MASK 1<<4
  #define PINCTRL PORTA.PIN4CTRL
#endif

void setup() {
    Serial.begin(115200);
    Serial.println("Starting up");
    Serial.flush();
    pinMode(LED_BUILTIN,OUTPUT);
  pinMode(ERSATZ_RESET_PIN, INPUT_PULLUP); //on 8-pin parts, this overrides the previous line! No LED blink on these; not enough pins to demo everything here...
  while ((!(VPORTA.IN & RESET_MASK)) && millis() < 10); //wait up to 10 milliseconds for pin to go high...
  // MUCH longer than we need to wait...
  if (!(VPORTA.IN & RESET_MASK)) { //if still not high, reset...
    _PROTECTED_WRITE(RSTCTRL.SWRR, 1);
  } else {
    PINCTRL = PORT_PULLUPEN_bm | PORT_ISC_LEVEL_gc;
  }
}

void loop() {
    static unsigned long loopCount = 0;

    Serial.print(loopCount++);
    Serial.print(" @ ");
    Serial.println(millis());
    VPORTA.IN|=0x80;
    Serial.flush(); // <<---------- eventually hangs here w/1MHz clock
}

ISR(PORTA_PORT_vect) {
  byte flags = PORTA.INTFLAGS;
  PORTA.INTFLAGS = flags; //clear flags
  if (flags & RESET_MASK) {
    _PROTECTED_WRITE(RSTCTRL.SWRR, 1);
  }
}
SpenceKonde commented 3 years ago

It is in this while loop

void UartClass::flush() {
  // If we have never written a byte, no need to flush. This special
  // case is needed since there is no way to force the TXCIF (transmit
  // complete) bit to 1 during initialization
  if (!_written) {
    return;
  }

  // Check if we are inside an ISR already (e.g. connected to a different peripheral then UART), in which case the UART ISRs will not be called.
  // Spence 10/23/20: Changed _poll_tx_data_empty() to instead call the ISR directly in this case too
  // Why elevate the interrupt if we're going to go into a busywait loop checking if the interrupt is disabled and if so, check for the bit and
  // manually call the ISR if the bit is set... *anyway*? Plus, in write(), this mode will be enabled upon a write of a single character from an ISR
  // and will stay that way until the buffer is empty, which would mean that the fairly long and slow UART TX ISR would have priority over a
  // potentially very fast interrupt that the user may have set to priority level 1. Just because a whizz-bang feature is there doesn't mean
  // it's appropriate to use for applications where it has only very small benefits, and significant risk of surprising the user and causing
  // breakage of code that would otherwise work. Finally, the previous implementation didn't check if it was called from the current lvl1 ISR
  // and in that case flush(), and write() with full buffer would just straight up hang...

  // Spin until the data-register-empty-interrupt is disabled and TX complete interrupt flag is raised
  while (((*_hwserial_module).CTRLA & USART_DREIE_bm) || (!((*_hwserial_module).STATUS & USART_TXCIF_bm))) {

    // If interrupts are globally disabled or the and DR empty interrupt is disabled,
    // poll the "data register empty" interrupt flag to prevent deadlock

    _poll_tx_data_empty();
  }
  // When we get here, nothing is queued anymore (DREIE is disabled) and
  // the hardware finished transmission (TXCIF is set).
}

both of these if's are true...

void UartClass::_poll_tx_data_empty(void) {
  if ((!(SREG & CPU_I_bm)) || (!((*_hwserial_module).CTRLA & USART_DREIE_bm)) || CPUINT.STATUS) {
    // Interrupts are disabled either globally or for data register empty,
    // or we are in another ISR. (It doesn't matter *which* ISR we are in
    // whether it's another level 0, the priority one, or heaven help us
    // the NMI, if the user code says to print something or flush the buffer
    // we might as well do it. It is entirely plausible that an NMI might
    // attempt to print out some sort of record of what happened. -Spence 10/23/20
    //
    // so we'll have to poll the "data register empty" flag ourselves.
    // If it is set, pretend an interrupt has happened and call the handler
    // to free up space for us.

    // Invoke interrupt handler only if conditions data register is empty
    if ((*_hwserial_module).STATUS & USART_DREIF_bm) {
      _tx_data_empty_irq();
    }
  }
  // In case interrupts are enabled, the interrupt routine will be invoked by itself
}

Things are starting to fall into place

SpenceKonde commented 3 years ago

stranger and stranger... it's now looking like a key factor is the fact that the ring buffer is skipped entirely if there;s nothing waiting to be sent.... In fact, that is SUPER IMPORTANT because THE DRE IS NEVER EVEN BEING ENABLED! It looks as if the overhead of stuffing a character into the TXDATA register to send it is sufficiently slow that it is on the same order as the time it takes to transmit the byte down the wire!!!

So that bit about "serial is way slow compared to the CPU"? No longer true at 1 MHz and 115200 baud... So those functions, the IRQ and the TX polling.. they only get called in flush() in this case (though I suspect I could contrive a case where bytes are fed to serial fast enough that this became an issue

SpenceKonde commented 3 years ago

So what appears to be happening then is that the while condition remains true, that means either DRE is enabled (but we never enable him - we can forget about DRE) - so it means TXCIF is cleared after the last byte is sent, somehow. Which leads to this spinning forever because it's waiting for that last byte to finish sending which will set the TXCIF flag.

How could this be? Well, there are only two places we clear it... and one is in DRE ISR, so that ain't the problem!

size_t UartClass::write(uint8_t c) {
  _written = true;

  // If the buffer and the data register is empty, just write the byte
  // to the data register and be done. This shortcut helps
  // significantly improve the effective data rate at high (>
  // 500kbit/s) bit rates, where interrupt overhead becomes a slowdown.
  // IN THIS CASE, THIS ALWAYS ENDS UP BEING TRUE
  if ((_tx_buffer_head == _tx_buffer_tail) && ((*_hwserial_module).STATUS & USART_DREIF_bm)) {
    (*_hwserial_module).TXDATAL = c;
    (*_hwserial_module).STATUS = USART_TXCIF_bm;
    // Make sure data register empty interrupt is disabled to avoid
    // that the interrupt handler is called in this situation
    (*_hwserial_module).CTRLA &= (~USART_DREIE_bm);

    return 1;
  }
 /* SNIP removed code to handle using the ring buffer and DRE */
}

They feed the new byte in, THEN clear the TXCIF flag.... but interrupts are enabled here. If millis interrupt fires at that instant, and the interrupt takes longer to process than the byte takes to send, the byte will have sent, TXCIF will have been set for the last time, and then we clear it, and woe betide us, because flush will think that it's always waiting for the last bit. You can see radically different behavior with the different timer options too... with TCB, it never fails. The TCB ISR is the simplest of them - it literally just increments a single unsigned long. and returns. With TCA, it takes longer to fail than with TCD, which has the slowest ISR. It is the default millis/micros timer because it's also the timer people are least likely to want to take over. The TCBs are used for servo, tone, and are spectacularly useful utility timers - so defaulting them on parts with only a single TCB is out of the question, as it would mean no tone or servo support, and on ones with 2, you're still blocking off one of the most useful timers, and disabling the ability to use a servo and sound an audible alarm at the same time and are how you do input capture (and really good at it), while the TCA is probably the most-taken-over timer because people want to generate special PWM with it, so we don't want to default that for millis!

Meanwhile on the 8 and 14 pin parts, TCD0 is not even used for PWM, and on 20/24-pin parts, it's used for two pwm channels which it can do at the same time as running millis (the only caveat is that you lose a maybe 10-20 system clocks every time you digitalWrite() a TCD0 PWM pin when it's outputting PWM or analogWrite() it when it's not, though note that since a couple of versions ago, analogWrite(PIN_PC0 or PIN_PC1, 0 or 255) does not call digitalWrite() anymore - it doesn't disconnect the pin from the timer, but rather sets the timer duty cycle such that it is outputting a constant LOW, and then inverts the pin if the second argument was 255, simulating the normal behavior, without the enable-protected rigmarole. digitalWrite() is smart enough to know that if the pin is one of the two TCD0 PWM pins, and it was currently set up for PWM, it needs to also clear INVEN, obviously ;-) )

SpenceKonde commented 3 years ago

Now that problem is understood, no plans to backport to 2.2.x - this isn't that critical.

jvasileff commented 3 years ago

Thanks for the fix! I tested with git cherry-pick 7076f1c, and it worked great.