Breaking changes 1.3.x -> 1.4.x

obdevel commented 2 years ago

So I eventually got around to updating from 1.3.10 and a couple of things stopped working.

the standard I2C bus scan finds a device at every address, not just the one device that is actually present. I'm running at 3.3V and have external 3K9 pull-ups. The device (OLED display) works fine thereafter
interrupts stopped working. I'm using a 3rd party library to drive an SPI radio module and the data receive ISR is not firing (https://github.com/LowPowerLab/RFM69). Related to attachInterrupt() using pin number vs interrupt number ?
1.4.x needs the latest version of the IDE otherwise boards manager throws an exception. No big deal. Just an FYI

I will revert back to 1.3.10 for the time being but happy to investigate further.

IDE 1.8.19 on macOS.

I'm using an AVR128DA28 on an own design board. Only one change from the default board config (UART bootloader pins).

Di-Ny commented 2 years ago

Same here, I got the feeling that the 1.4.X is quite unstable. I finally went back to 1.3.10 yesterday. Here is the things that stopped working or acted strange in 1.4:

The reset flag I used to monitor the reset cause no longer works: RSTCTRL.RSTFR now equals 0 each time, which is unknown reset cause.
I attached an interrupt the the serial Rx 1 so that I can wake up the device when I send commands. Problem: it now makes the µC reset.
The eeprom seems to be more often corrupted

SpenceKonde commented 2 years ago

I am not getting the exception with 1.8.13 - I think the exception was because I fucked with the json, having forgotten that once the json file has been released, it is set in stone - or should be; If it changes, anyone who installed from the old version will get exceptions when they download or try to use the changed version. That was a particularly bad screwup; and I should have known better (I've done it before)

How does this so-called "standard bus scan" work? The TWI peripheral provides no such facility - My guess is that it's confused because, by popular demand, we added a timeout to I2C, so that when a device bugs out and starts holding one or both I2C line low indefinitely (which they seem to do whenever you do anything they don't like) the hung bus wouldn't straight up hang the project, so you could have it react (ex, by power cycling the malfunctioning device or pulsing its' reset pin - or at least printing some sort of debugging message, instead of just turning to stone until manually restarted, which is what often happened. My guess is it's mistaking that for a

Huh, it sounds like at some point between when I tested the new attachInterrupt and the release, something changed and that is now broken. I will investigate. The changes to attachInterrupt (though the default option was supposed to be transparent, which it clearly isn't, and the motivation for the change is described in detail in the interrupt reference: https://github.com/SpenceKonde/DxCore/blob/master/megaavr/extras/Ref_Interrupts.md (in short, because I wanted it to be possible to define a pin interrupt manually (so it can be lightning fast) even if the code (or more likely, libraries used by the code) attach an interrupt - since the stock message defines the vector for EVERY port when you attach to ANY ONE port. The savings in flash are a big deal for tinyAVR, less so for Dx-series, but the fact that the stock method is appallingly inefficient is equally applicable. Any mechanism by which a function is "attached" to an interrupt is grossly inefficient (the simple act of caling a function from within an ISR uses like 35-45 clocks) but it's much worse on AVRxt systems because they also have to have all the machinery that classic AVRs did for a PCINT, whereas attachInterrupt normally worked on only the INTn pins, each of which had their own vector, and hence didn't have to loop over every flag. (that was inherited from the official 4809 core, and I never got up the gumption to try to do something about it until recently). So it got two options, the default being the new mechanism, but enabled for all parts, so if attachInterrupt is referenced, it takes over all port interrupts (I tried like the dickens to get that to autodetect which port if the pin passed was constant, but I couldn't seem to figure out a way). The next is a manual mode which provides a new function which needs to be called before the interrupt is attached, but mostly serves to pull in only the desired ISRs.. But I knew it was a high risk change: that's why there's a third menu option: to use the old version of attachInterrupt instead. That should resolve the issue until I get the problem sorted out.

@Di-Ny - As documented in https://github.com/SpenceKonde/DxCore/blob/master/megaavr/extras/Ref_Reset.md, the initialization function now checks the reset cause flags. IF there is a reset cause, the value of RSTCTRL.RSTFR is stored to GPIOR0 for user code to access. If there is NO reset cause flag, that could only occur because the previous "reset" did not actually involve a reset. - it was a "dirty reset" condition (caused by stack corruption, calling bogus function pointers, whatever it was, it somehow wound up at the reset vector without a hardware reset occurring. This is virtually guaranteed to produce undesired behavior, and I realized that all those times when an AVR hangs or resets and doesn't come back cleanly - they are almost certainly from a bad error that led to a dirty reset and from there to a hang or dirty-reset bootloop. In the case of no reset cause, the initialization code will issue a software reset so that the system can restart cleanly. You can override that behavior as documented in the reset reference I linked above.

Please give more information on the EEPROM corruption issues! I am not aware of any such issues.

SpenceKonde commented 2 years ago

Okay, found the issue with attachInterrupt - that was easy - wrong version of the ISR when used in the default mode (manual mode, however, has the correct version!) would corrupt the stack, almost always leaving a 0 in the first byte on the stack;

SpenceKonde commented 2 years ago

Also, @obdevel: regarding interrupt and pin numbers, they are the same:

#define digitalPinToInterrupt(P)            (P)

Di-Ny commented 2 years ago

Oh Ok I missed that information for the Reset cause, my bad. For the EEPROM, I don't know for sure :/ But to describe the issue if it can help:
I use an error detection, from time to time I check a specific byte in the EEPRom which is not supposed to change. When it does change it means something unwanted happened, then I software reset the µC with a reinitialisation of all eeprom values to default. Maybe most of the time this is all happenning silentely. But yesterday I was monitoring the device over serial, and I noticed strange behaviour, and indeed all the printed eeprom had irrelevant values. This happened twice yesterday, before I revert back to 1.3.X. Shortly after the second corruption, the µC crashed outputing tons of trash over serial, before the error detection struck.
Since the revert, I am still monitoring and it didn't happen again. So it is just a feeling, nothing sure. Maybe the eeprom error was a consequence of a bad interrupt or so.

obdevel commented 2 years ago

Thanks for the updates. I'm really happy with 1.3.10 as it's stable and sufficiently functional for me, so please don't take this as criticism, just observations of changed behaviour.

Regarding the I2C bus scanner, it's a common piece of code and relies on Wire.endTransmission() returning a value greater than 0 if no device replies to Wire.beginTransmission(addr). This worked as expected in 1.3.x but 1.4.x behaviour has changed. As I said in my OP, expected Wire/I2C functionality is fine otherwise.

void I2C_scan(void) {

  byte addr, num_devices = 0, ret;

  Wire.begin();
  Wire.setClock(100000);

  Serial.printf(F("> M: scanning I2C bus 0 ...\r\n"));

  for (addr = 0; addr < 128; addr++) {
    Wire.beginTransmission(addr);
    ret = Wire.endTransmission();

    if (ret == 0) {
      Serial.printf(F("> M: device found at addr = 0x%x\n"), addr);
      ++num_devices;

      if (addr == OLED_I2C_ADDR) {
        display_is_present = true;
      }
    }
  }

  Serial.printf(F("> M: bus scan complete, found %d devices\n"), num_devices);

  if (display_is_present) {
    Serial.printf(F("> M: OLED display is present\r\n"));
  } else {
    Serial.printf(F("> M: *** OLED display is NOT present\r\n"));
  }

  return;
}

SpenceKonde commented 2 years ago

Wait, so what is the old behavior of Wire, and what was it changed to? Am I reading that correctly? The old version of Wire would return a value other than 0 if there was no response?? There was not supposed to be a change in the values returned - just the implementation of a timeout mechanism that prevented all ways that it could hang (the old version of the library would time-out some error conditions, but would hang for eternity in others as I recall) (albeit a cruder - and less resource intensive - timeout implementation than the official AVR core uses). (I wasn't the main author of the new Wire library, that was @MX682X who did all the heavy lifting. I just did bits and pieces.

The reset guide (as well as the robustness guide) was prompted by a rash of inquiries where, essentially, devices were being shipped to end users without even basic testing and design verification having occurred, and I was alarmed by how little testing and basic "best practices" sort of stuff was going into some devices relative to how far along the product development cycle they were.

obdevel commented 2 years ago

Previously in 1.3.x (and in every other implementation of the Wire library since the beginning of time), Wire.endTransmission() will return 2 or 3 if there is no response from that address (i.e. a timeout, I assume). This also seems to the expected behaviour according to the docs: https://www.arduino.cc/en/Reference/WireEndTransmission

In 1.4.10, Wire.endTransmission() is returning 0 even if there is no device at that address, so the sketch thinks there is a device at every address on the bus.

Just google for I2C scanner sketch ... there are 101 variations on the same theme. e.g. https://playground.arduino.cc/Main/I2cScanner/

SpenceKonde commented 2 years ago

Oh dear. I never realized that that was what the API was (like I said, I wasn't doing the heavy lifting - I just tacked on small things. Yeah it's returning the number of bytes written.

But yeah, the new wire behavior is wrong, it does not conform to the API and this was not intended.

obdevel commented 2 years ago

No worries. That's what testing is for. I'll look out for a new release soon and do some more testing. For the time being 1.3.x is functional and stable.

SpenceKonde commented 2 years ago

Fixes are in on DxCore, will port to megaTinyCore and release today.

Much thanks for reporting these issues!

SpenceKonde / DxCore

Breaking changes 1.3.x -> 1.4.x #225