SpenceKonde / megaTinyCore

Arduino core for the tinyAVR 0/1/2-series - Ones's digit 2,4,5,7 (pincount, 8,14,20,24), tens digit 0, 1, or 2 (featureset), preceded by flash in kb. Library maintainers: porting help available!
Other
557 stars 144 forks source link

i2c Problems Hanging state #531

Closed JDJelectronics closed 2 years ago

JDJelectronics commented 3 years ago

Dear Spence,

i have an attiny 3216, now the problem is that i am using 6 i2c chips for an ioT project. now the chip keeps crashing at wire.begin.

and also when I reset the device it does not go beyond the state. I really have to upload an empty sketch and then upload my own sketches before it does anything again. but that is only for a short time then it crashes again at wire.begin

how can I solve this?

I am using mega tiny core 2.4.2.

Regards JDJ.

SpenceKonde commented 3 years ago

Is it a master or slave? If you are trying to do both, that isn't supported in 2.4.2

The specifics of the bug are not important however - there is a totally rewritten version of Wire now (huge thanks to @MX682X ) ready for testing, Please retest with the new version of wire.

I just checked in a version that is believed to be working. It drastically reduces flash consumption vs the version from 2.4.2 as well as adding support for

Except for the case of being both a master and slave, you only have to replace the library folder with the new version. By default for board manager installation of core to non-portable Arduino IDE on windows, it's located located somewhere like

C:\Users\Spence\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire

(note that appdata is hidden - using this awful location for board manager installation was probably forced by microsoft's restrictions; I don't think they would have chosen this location otherwise.)

Do also be sure that you have appropriate external pullup resistors on SCL and SDA. These are not optional. Calling the usePullups() method before Wire.begin() will turn on the internal ones, but they are far weaker than the standard requires, and may be picky about the number of devices, length of the wires, and the phase of the moon; it should not be relied upon, and if calling usePullups() ever fixes a problem, that means you are missing the external

JDJelectronics commented 3 years ago

HI Spence,

I have modified the core files with what you have provided. I just can't get it compiled. I have 2 option at TWI/I2C:

Master or slave (save flash & ram)

and master and slave.

Also I get this output error.

Error while detecting libraries included by C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c

C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c: In function 'TWI0_usePullups': C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c:27:0: error: unterminated #ifndef

ifndef TWI_PINS_H

C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c:321:1: error: expected declaration or statement at end of input void TWI0_usePullups() { ^~~~

SpenceKonde commented 3 years ago

Oh damnit, I tried to catch a corner case, - something so trivial it couldn't break anything, and wound up breaking closures for conditionals, fixing now.

On Mon, Sep 27, 2021 at 7:24 AM Jelmer de Jong @.***> wrote:

HI Spence,

I have modified the core files with what you have provided. I just can't get it compiled. I have 2 option at TWI/I2C:

Master and slave (save flash & ram)

and master and slave.

Also I get this output error.

rror while detecting libraries included by C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c

C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c: In function 'TWI0_usePullups': C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c:27:0: error: unterminated #ifndef

ifndef TWI_PINS_H

C:\Users\JDJ Work\AppData\Local\Arduino15\packages\megaTinyCore\hardware\megaavr\2.4.2\libraries\Wire\src\twi_pins.c:321:1: error: expected declaration or statement at end of input void TWI0_usePullups() { ^~~~

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/SpenceKonde/megaTinyCore/issues/531#issuecomment-927781204, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTXEWZ324ITOMVLDNZYNLDUEBIARANCNFSM5EX6GNVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--


Spence Konde Azzy’S Electronics

New products! Check them out at tindie.com/stores/DrAzzy GitHub: github.com/SpenceKonde ATTinyCore https://github.com/SpenceKonde/ATTinyCore: Arduino support for all pre-2016 tinyAVR with >2k flash! megaTinyCore https://github.com/SpenceKonde/megaTinyCore: Arduino support for all post-2016 tinyAVR parts! DxCore https://github.com/SpenceKonde/DxCore: Arduino support for the AVR Dx-series parts, the latest and greatest from Microchip! Contact: @.***

JDJelectronics commented 3 years ago

Great that you're working on it.

do you know when there is an update?

SpenceKonde commented 3 years ago

Sorry bout that, fix went in a couple of hours ago.

JDJelectronics commented 3 years ago

The fix is working for the compiler. but the whole i2c does nothing and hangs.

mrWheel commented 3 years ago

This looks a lot like my problem #525

I have replaced the 2.4.2 Wire map and boards.txten programmers.txt with the new one from the master branche but the problem is still there. Sometimes the code runs beyond the Wire.begin() but sometimes it does not.

With the new library that's still the case :-(

Sometimes the system 'hangs' indefinitely. Even power-off-on does not always help, but sometimes it does.

So now I want to enable the WDT right before that statement and disable it afterwards .. but I cannot find how to do that...

Did fond how to reset the WDT

void wdt_reset() {
  __asm__ __volatile__ ("wdr"::);
}

Can you tell me how to enable and disable the WDT (little code example)?

Thanks!!!

SpenceKonde commented 3 years ago

https://github.com/SpenceKonde/megaTinyCore#using-watchdog-to-reset-when-hung

should cover it. I really don't like that solution though. We really need to get this problem sorted out ASAP - whether it's your hardware, my core, or your code, that absolutely is not behavior you should have to be living with. One thing I find very suspicious is that both current github version and 2.4.2 release both have the problem. But.... they use a completely differenty iplementation.

I would also suggest that you put a LED on an unused pin and set that output and then immediately prior to the call to begin, digitalWriteFast(pin,HIGH); Then first thing after begin, digitalWriteFast(pin,LOW), if it gets stuck high, you know 100% for sure where it's stuck.

(I like those constructions for debugging - the require pin to be compile time known, but optimize down to a single clock instruction when it is - the normal digital I/O functions leave much to be desired. )

Trying to narrow down how begin could end up hanging, because I didn't think it did anything that should be able to hang.

Do things work with simple bus configurations? Are these devices masters or slaves?

What values of pullup resistor are in use on SDA and SCL (be sure to count any ones located on the modules you're talking to...)

mrWheel commented 3 years ago

Hi Spence,

Thanks for your answer and sorry for my ignorance.

On my I2C bus there are 3 mux IC’s (pcf8574). Pullup resistor is 2k2 and also tested with 4k7 (there are no “modules”). All three pcf8574 have a mux1.begin(hexAddress3), mux2.begin(hexAddress2) and mux3.begin(hexAddress3).

There is a led that lights up before every begin() statement and dims after the begin() statement. There are also print statements with flush() before and after. The processor “hangs” somewhere in the begin() because if it does the led stays “on” and the second println() is not executed.

I read the WDT doc you mention in your reply (thats where I found the wdt_reset() function) but it is incomprehensible for me how to code it in such a way that I can enable and disable the WDT on demand. I have done it before for ATmega328 processor but that involved a lot of register handling (which I found on the internet).

Or am I missing something? Is there a wdt_enable() and wdt_disable() statement? How long after a wdt_reset() does the WDT fire (in the program there are statements that can take a lot of time waiting for a reply).

MX682X commented 3 years ago

@mrWheel are you using the arduino library for the PCF8574 by RobTillaart? If so, do you set the address in the constructor? Because it seems weird that you call begin with the address again.

mrWheel commented 3 years ago

@mrWheel are you using the arduino library for the PCF8574 by RobTillaart? If so, do you set the address in the constructor? Because it seems weird that you call begin with the address again.

Sorry, I use a class and misinformed you. I call the constructor with an address. In the constructor I call Wire.begin() without an address!

SpenceKonde commented 3 years ago

@mrWheel - two thoughts:

  1. Wait, you're calling Wire.begin() in a constructor? That doesn't seem like a good idea, it should be called in setup()....
  2. Does it reproduce if you comment out all the references to the second and third PCF8574? code that could reproduce this in a simple case would be of great value. I think I even have some of those parts - isn't that what those stupid LCD I2C backpacks use? I have a bunch of those kicking around from when I decided to switch processors and was then able to use parallel control and lose the backpack. But the easier this is to reproduce, the faster we can figure out what's happening.

The fact that both classes have begin methods is making the above confusing to read. Is it a mux.begin(address) or Wire.begin() is it hanging in? While it's hung, what voltage do you measure on the I2C lines (scope of course would be even better)

I'll look at the WDT section and see if there's some key piece missing, I swear I had that covered.

JDJelectronics commented 3 years ago

@mrWheel Hi!,

the WatchDog would be a godsend for me now for a temporary period.

I have more than 100 devices with customers! that have problems with the i2c. there are more every day.

If I had known that those Attiny chips had this I would have chosen another AVR chip.

hope Spence can fix this

MX682X commented 3 years ago

Alright then... @mrWheel can you post your whole code then? It might reduce our confusion

MX682X commented 3 years ago

also, @JDJelectronics could you post the code that has troubles?

SpenceKonde commented 3 years ago

I have 2 option at TWI/I2C.

Right, but in your code, which mode are you using it in?

The fix is working for the compiler. but the whole i2c does nothing and hangs.

When does it hang? What? Wire.begin() (use the LED check

I have more than 100 devices with customers! that have problems with the i2c. there are more every day.

How have we gone from issue first created just a few days ago to a huge volume of problems so quickly? What has changed? Was it just shipped, or was there a software change that broke stuff. If so, what?

I will also repeat to you my question about pullups. I frequently get inquiries from users who have omitted these. Do you have pullups? What values?

As I asked mrWheel, when it's in the hung state, what voltage do you measure on SCL and SDA?

@MX682X There seems to be a sudden uptick in I2C issue reports since 2.4.x, when the baud rate calculation was reworked,. Under default conditions, are we sure it's behaving sensibly w/regards to the master baud rate?

MX682X commented 3 years ago

also also: If you have access to an UPDI debugger: You can use microchip studio to debug Arduino sketches,. To do that, you have to select the File->Open->Open object file for debug. There you can select the file to open. It is saved in AppData/Local/Temp/arduino_buildxxxxxx (x are some numbers) after hitting the compile button in the Arduino IDE. It would significantly help locating the problem if you could say where exactly it hangs.

mrWheel commented 3 years ago
  1. Wait, you're calling Wire.begin() in a constructor? That doesn't seem like a good idea, it should be called in setup()....

Ok, I have changed my code to only call the Wire.begin() once in setup. Device is under test now and see what the outcome is tomorrow. Keep you posted!

The fact that both classes have begin methods is making the above confusing to read. Is it a mux.begin(address) or Wire.begin() is it hanging in? While it's hung, what voltage do you measure on the I2C lines (scope of course would be even better)

If it hangs again after my changes I will measure with a scope the voltages on SDA and SCL and report back.

I'll look at the WDT section and see if there's some key piece missing, I swear I had that covered.

That would be nice!

@JDJelectronics Are you using code from my github? If so you should comment-out the Wire.begin() in the constructor and code one call to Wire.begin() in the setup() function!

mrWheel commented 3 years ago

Alright then... @mrWheel can you post your whole code then? It might reduce our confusion

Well, thats 1416 lines of code ...... My hopes are on the multiple call's to Wire.begin() that I have now eliminated to only one call in setup().

If that indeed solves the problem I do apologize to you guys!

mrWheel commented 3 years ago

@MX682X

also also: If you have access to an UPDI debugger: You can use microchip studio to debug Arduino sketches.

I have not :-(

SpenceKonde commented 3 years ago

@mrWheel - that is a big detail that I'd missed - so you're saying this issue happens rarely - too often to live with, but rarely enough that it takes a long time (overnight) to reproduce?

What I find rather disturbing is that there is nothing that I can find in Wire.begin() that plausibly looks like a place it could be getting stuck.... there's no loop waiting for bits to be in some state, I don't see any obvious place that an interrupt is being enabled where an interrupt might be triggered but handled improperly or sent to badISR....

Another LED test that might be useful - remove those writes from around begin, and instead at start of setup() do

pinMode(pin,OUTPUT);
digitalWriteFast(pin,CHANGE);

Good way to see if the failure puts it into a reset loop or not. Better still if youve got two leds on different pis that can be simultaneously used to monitor both, of course.

MX682X commented 3 years ago

If that indeed solves the problem I do apologize to you guys!

This makes me worry though, since I thought that I've written the code in a way that it will not be a problem if called multiple times. I have a theory, but I have no idea why exactly it creates trouble...

src.zip

I've modified three files of the Wire library that you can find in the zip file. Please try to replace them in your apropriate folders and try to call Wire.begin() multiple times. I'd like to see if that caused the hang or not. I have no setup ready to test it though.

P.S. gonna have to go now, will be available again in at least 8 hours...

mrWheel commented 3 years ago

@mrWheel - that is a big detail that I'd missed - so you're saying this issue happens rarely - too often to live with, but rarely enough that it takes a long time (overnight) to reproduce?

Well, not really.

When I flash the firmware it, most of the time, starts correctly with some leds flashing "slow" to indicate the beginning of setup and before the Wire.begin() the Leds turned "on" and after it is turned "off" again at the end of setup() the led flashes "fast".

The strange thing is that sometime after power-off-power-on or reflash it all comes to a stop at Wire.begin() --> no!! it stops at PCF8574::begin()!! and the Led stay's on. From that moment you can do what you want but it will not start up normally again. Not if you reflash, not if you power-off-on and not if you press [reset] (interrupt). And the moment you give up all hope .. it works flawless again for a long time. Until when it wakes up after sleep it hangs again...

But wait!!!

I now narrowed it down to this function (readPin()):

//===========================================================
bool PCF8574::begin(uint8_t address) 
{
  digitalWriteFast(PIN_PC0, HIGH);
  _address = address;

  //Wire.begin(); //-- <<-- Call this once in setup()!!!!!
  if (readPin()) 
  {
    digitalWriteFast(PIN_PC0, LOW);
    return true;
  }
  else
  {
    digitalWriteFast(PIN_PC0, LOW);
    return false;  
  }

} //  begin()
.
.
//===========================================================
bool PCF8574::readPin() 
{
  Wire.requestFrom(_address, (uint8_t) 0x01);
  uint32_t startTime = millis();
  while ((Wire.available() < 1) && ((int)(millis() -startTime) <= 200)) {;}
  if ((int)(millis() -startTime) > 200)
  {
        return false;
  }
  _PIN = Wire.read();
  return true;

} // readPin()

I will investigate further tomorrow. It's now time to go to bed..

JDJelectronics commented 3 years ago

I have used all 2 mode of TWI/I2C

the problem has been going on for a while but couldn't find the problem. the devices have worked non-stop for 3 months. now they fall away.

I couldn't put my finger on what the problem is, initially I thought it was an error with the battery system.

so I took 1 off the field for further investigation. and conclude that the system crashes at Wire.begin.

Yesterday I did an update to the device from 2.3.2 to 2.4.2 that didn't work. Then I did a rollback to 2.3.2 but that obviously didn't help but I thought I can try it.

there is a 4K7 pull up resistor on the SDA and SCL line.

I would like to share the code but I can't because of permissions.

SpenceKonde commented 3 years ago

Interesting - Is it a master Wire.begin() (enabling master mode) or Wire.begin(address) that hangs it?

What is the state of the bus lines while it's hung? Are any of them stuck low or something?

And you say that, with no changes anywhere they have started failing after 3 months? And previously, they never did? What version of megaTinyCore were you originally using?

SpenceKonde commented 3 years ago

There has been a fair amount of generally tweaking of Wire.h in 2.3.x and 2.4.x because the old implementation was bloated (the folks on the small flash parts hated it, for obvious reasons) and lacking a very often requested feature (that feature being acting as both a master and a slave in the same sketch)

To be clear: Prior to 2.4.3, it is not expected to be able to have both master and slave modes enabled. I would expect that to fail in strange and mysterious ways.

SpenceKonde commented 3 years ago

The strange thing is that sometime after power-off-power-on or reflash it all comes to a stop at Wire.begin() --> no!! it stops at PCF8574::begin()!! and the Led stay's on. From that moment you can do what you want but it will not start up normally again. Not if you reflash, not if you power-off-on and not if you press [reset] (interrupt). And the moment you give up all hope .. it works flawless again for a long time. Until when it wakes up after sleep it hangs again...

I'll reply later in your thread as I'm not convinced these are identical problems.

mrWheel commented 3 years ago

The strange thing is that sometime after power-off-power-on or reflash it all comes to a stop at Wire.begin() --> no!! it stops at PCF8574::begin()!! and the Led stay's on.

I'll reply later in your thread as I'm not convinced these are identical problems.

I have not had the problem with Wire.begin() anymore since I only call it only once in setup() .. but of-cource that does not mean the problem is gone. It just did not happen the last few hours.

JDJelectronics commented 3 years ago

they are battery powered devices and i think this behavior has always been there but never noticed. This is because the battery has never been empty. it was only after the battery ran out that problems started to arise in the Wire.begin. if they work 1 time without an empty battery, there is nothing wrong I think.

I think it's fine with 1x flashing until the battery runs out that something goes wrong in the core.

resetting a device with a full battery also gave me the problem.

it has never been noticed until now.

mrWheel commented 3 years ago

At @MX682X

@mrWheel are you using the arduino library for the PCF8574 by RobTillaart? If so, do you set the address in the constructor? Because it seems weird that you call begin with the address again.

No, not a library from RobTilaart but a class I made myself with a lot of help from bits and pieces from the internet ..

SpenceKonde commented 3 years ago

Now - here is an interesting common thread.....

resetting a device with a full battery also gave me the problem.

And @mrWheel described similar behavior where reprogramming after this occurred was also difficult.

If this issue turns out to have been caused by the multiple Wire.begins() - well, that's clearly a bug (though I don't see how - we trap that!). What's the power situation on your boards like? batteries? Powered by external power supply? Ideally a schematic - but I'm starting to get particularly suspicious of a known way to put these in a bad state - why it seems to be

But @JDJelectronics - you say resetting a device could trigger this problem, but how did you achieve this reset? Have you turned the UPDI pin into reset to be able to do it that way? Were you attempting UPDI reset? Have your rigged up an Ersatz reset?

I assume you're using UPDI programming?

The other side to this is that they don't recover from having a dead battery.... There are a ton of parameters here that will all have something to tell us. What kind of battery? LiPo? Is the tiny running straight off the LiPo or is there something in between that provides a constant voltage? What do you have the BODCFG fuse set to? (and - excuse the silly question - if you're using the Arduino IDE to program these, did you, ah - did you remember to "burn bootloader" to set unsafe fuses like BODCFG?

(Summary of when fuses are set: UPDI uploads only set the safe fuses that the core configures through a tools submenu, SYSCFG1, OSCCFG, and BOOTEND, which, respectively, set SUT (which hardly anyone cares about, but isn't risky to set, either), whether the chip runs at 16-derived or 20-derived speed (this not being set on upload was about 50% of the support questions I got), and whether the chip will use a bootloader, (because BOOTEND must specify the end of the bootloader to be at the same address as the start of the application that we pass to compiler - a sketch that was compiled expecting no bootloader to a board with BOOTEND != 0 would be guaranteed not to work.). Burn bootloader sets all fuses - we do the ones I mentioned above, there are menus to set up BODCFG and SYSCFG0 (they can render the device hard to program when set wrong, like burn bootloader on classic AVRs could), and everything else is set to default. See documentation for more information)

In the "dead battery" case, what is the voltage that the part is seeing? Has the battery cut off all power (UVLO) or not - and if not, and it's a LiPo battery, how to do prevent battery damage from overdischarge?

Could the chip be failing because the voltage has dropped too low to support it's continued operation, but not low enough to trigger a POR - so when the voltage is increased, it is starting up uncleanly.

Does full power cycle revive the devices? (meaning, disconnect all power, short the supply rail to ground to discharge caps on the board, then reconnect power) . I have seen these parts end up in a broken state, where UPDI didn't work to program (or if it uploaded, the code didn't run), and even briefly disconnecting power and reconnecting didn't work because enough charge to keep it from recognizing the POR during that time. (it happened over a year ago while I had other pressing priorities to deal with, forget details, but I wasn't using Wire).... I wonder if this is the same phenomenon, triggered differently.

Are you saying that if you ever try rto upload more than once to a device, it won't take an upload? Surely I;m misunderstanding?

That is a very important piece of information , and one or more of the questions above have an answer that will also be critical (but I don't know which one it will turn out to be)

MX682X commented 3 years ago

Ok, so I've learnt three things now: A. JD's problem has nothing to do with my implementation of Wire since it persists across completely different implementations B. JD's problem has nothing to do with mrWheel's problem. C. mrWheels problem lies in Wire.requestFrom(), not in Wire.begin()

About C: There was a bug in requestFrom in my implementation that would keep the CPU in an endless while-loop if there was no ACK after writing the slave address. This can happen if no slave with the address is on the bus, there are no pull-ups, or the supplied address is wrong. I've published a hotfix for that a couple of days ago.

mrWheel commented 3 years ago

@MX682X :

Ok, so I've learnt three things now: A. JD's problem has nothing to do with my implementation of Wire since it persists across completely different implementations

What do you mean by “(it) persists across different implementations”? Why do you think that?

B. JD's problem has nothing to do with mrWheel's problem.

Maybe my English is not good enough, but I see a lot simularities.

C. mrWheels problem lies in Wire.requestFrom(), not in Wire.begin()

Yes! Thats what I now think so too.

About C: There was a bug in requestFrom in my implementation that would keep the CPU in an endless while-loop if there was no ACK after writing the slave address. This can happen if no slave with the address is on the bus, there are no pull-ups, or the supplied address is wrong. I've published a hotfix for that a couple of days ago.

where can I find that “hotfix”?

I used the Wire map from the master branch and sinds yesterday evening (here) I replaced the files with those in the src.zip file you send.

MX682X commented 3 years ago

JD's problem is related to low voltage glitches from what I can read, your Problem is somewhere on the bus. If you have the files from my src.zip, you have the hotfix. At this point, requestWire() should only hang up if there will be no data sent from the slave.

mrWheel commented 3 years ago

@MX682X :

At this point, requestWire() should only hang up if there will be no data sent from the slave.

Why does it has to hang? Is there no (working) Wire.setTimeout() function?

MX682X commented 3 years ago

No, there is not. Never was as far as I know. That's why I haven't implemented it. Since requestFrom is a blocking function and the amount of bytes should be known beforehand I didn't see a real reason why it should be needed. Plus I didn't see a nice way to do it, afterall I tried to keep the library as small as possible.

mrWheel commented 3 years ago

@MX682X:

Update

I have installed the hotfix from MX682X and I have the Wire.begin() again in the PCF8574 class. Wire.begin() is executed for every pcf8574 chip I have connected (3). I have tested like crazy but my board will not crash anymore (but that is a behaviour I have observed before. No guarantee the problem is gone). I have also tested without the repeated call to Wire.begin() (only call it once in setup())and that also does not crash the board at this point in time!

I will continue to test today and will keep you posted if something strange happens again!

Oh no!!!

As I was writing this comment the board crashed at Wire.requestFrom(_address, (uint8_t) 0x01); in the function bool PCF8574::readPin() (see code ~15 post ago) ... This is with the version where I execute Wire.begin() multiple times! And as observed before: resetting (Software reset) or re-flashing does not solve the problem .. but after a lot of resets (I have the WDT enabled) it suddenly works again! I'm not sure if I did it right but while it hanged SCL was about 1.8v. Now it's running again SDA is between 0 and 5 volt and SCL between 0 and 3 volt..

SpenceKonde commented 3 years ago

mrWheel - What is the voltage on the pins in the failed state? I seem to remember from when I was working with I2C routinely that I2C slaves would sometimes react to adverse conditions by inappropriately stretching the clock (forever).... And didn't you say that you needed to power cycle the whole thing to make it work again? That would also suggest that a slave was holding the bus low. I'd try to arrange to disconnect the I2C lines from the slaves without disconnecting power, so if one or both are low, you can be sure which side is holding it down.

mrWheel commented 3 years ago

mrWheel - What is the voltage on the pins in the failed state?

SDA 5Volt, SCL 1.8Volt

And didn't you say that you needed to power cycle the whole thing to make it work again?

Well, that sometimes makes it work again but for the last 12 minutes and twice re-flashing it still will not go past the requestFrom() ...

JDJelectronics commented 3 years ago

The problem is first noticed by a dead battery.

but it now also occurs with a full battery

the battery is a 12volt 7amp lead battery

I use the described documentation for the reset

void resetViaSWR() { _PROTECTED_WRITE(RSTCTRL.SWRR,1); }

following the message from @mx682 I also discovered that the problem is also in the Wire.requestFrom( Schermafbeelding 2021-09-28 122323 ).

mrWheel commented 3 years ago

What I see on the Scoop when al goes well Screenshot 2021-09-28 at 12 42 30 This is the "Wire.begin()"

Screenshot 2021-09-28 at 12 08 38 This is just using the bus

mrWheel commented 3 years ago

It definitely hangs at requestFrom()! Even after the mux.begin() (without the Wire.begin() in it) the code hangs on just reading of pins .. which all go through readpin()

Update

Wire.beginTransmission() is also suspicious. The last half hour for one reason or another I've had no problems with requestFrom() but now execution hangs on beginTransmission()!

mrWheel commented 3 years ago

This is the class I use to communicate with the pcf4874's


#ifndef PCF8574_H
#define PCF8574_H

#define ledOn   digitalWriteFast(PIN_PC0, HIGH);
#define ledOff  digitalWriteFast(PIN_PC0, LOW);

#include <Wire.h>

#define I2CWRITE(x) Wire.write(x)
#define I2CREAD()   Wire.read()

/******************************************************************/
class PCF8574 
{
public:
  PCF8574();

  bool      begin(uint8_t address = 0x21);
  void      pinMode(uint8_t pin, uint8_t mode);
  void      digitalWrite(uint8_t pin, uint8_t value);
  uint8_t   digitalRead(uint8_t pin);

protected:
  volatile uint8_t  _PORT;
  volatile uint8_t  _PIN;
  volatile uint8_t  _DDR;
  uint8_t           _address;
  bool    readPin();
  void    updatePin();
};

//===========================================================
PCF8574::PCF8574() :
    _PORT(0), _PIN(0), _DDR(0), _address(0)
{
}

//===========================================================
bool PCF8574::begin(uint8_t address) 
{
  _address = address;

  //--Wire.begin(); //-- <<-- Call this once in setup()!!!!!
  if (readPin()) return true;

  return false;  

} //  begin()

//===========================================================
void PCF8574::pinMode(uint8_t pin, uint8_t mode) 
{
  switch (mode) 
  {
    case INPUT:
        _DDR &= ~(1 << pin);
        _PORT &= ~(1 << pin);
        break;

    case INPUT_PULLUP:
        _DDR &= ~(1 << pin);
        _PORT |= (1 << pin);
        break;

    case OUTPUT:
        _DDR |= (1 << pin);
        _PORT &= ~(1 << pin);
        break;

    default:
        break;
  }

  updatePin();

} //  pinMode()

//===========================================================
void PCF8574::digitalWrite(uint8_t pin, uint8_t value) 
{
  if (value)  _PORT |= (1 << pin);
  else        _PORT &= ~(1 << pin);

  updatePin();

} //  digitalWrite()

//===========================================================
uint8_t PCF8574::digitalRead(uint8_t pin) 
{
  readPin();
  return (_PIN & (1 << pin)) ? HIGH : LOW;

} //  digitalRead()

//===========================================================
bool PCF8574::readPin() 
{
  ledOn;
  Wire.requestFrom(_address, (uint8_t) 0x01);
  ledOff;
  uint32_t startTime = millis();
  while ((Wire.available() < 1) && ((int)(millis() -startTime) <= 200)) {;}
  if ((int)(millis() -startTime) > 200)
  {
    return false;
  }
  _PIN = I2CREAD();
  return true;

} // readPin()

//===========================================================
void PCF8574::updatePin() 
{
  uint8_t value = (_PIN & ~_DDR) | _PORT;

  Wire.beginTransmission(_address);
  I2CWRITE(value);
  Wire.endTransmission();
  //-- needs at least 1.3us free time between start and stop
  delayMicroseconds(2);

} // updatePin()

#endif
/* EOF */
MX682X commented 3 years ago

@mrWheel thank you for posting the osciloscope pictures. From what I can see either your SDA or SCL voltage is out of recomendated specifications for the PCF8574 and the 3216. As far as I can see, the SDA Voltage is 3.3V and SCL is 5V. If the VCC is 5V, it needs at least 3.6V to be in the recomended area. That might explain why it works only sometimes.

mrWheel commented 3 years ago

@mrWheel thank you for posting the osciloscope pictures. From what I can see either your SDA or SCL voltage is out of recomendated specifications for the PCF8574 and the 3216. As far as I can see, the SDA Voltage is 3.3V and SCL is 5V. If the VCC is 5V, it needs at least 3.6V to be in the recomended area. That might explain why it works only sometimes.

Are you sure? Following three screen shots are all from the same capture. Vertical scale is 2Volt/Div so [Blue]SCL ~ 5 Volt and [Yellow]SDA ~ 4.2 Volt.

Time base 200 microSec/Div: Screenshot 2021-09-28 at 18 04 28

Time base 100 milliSec/Div: Screenshot 2021-09-28 at 18 03 27

This one is zoomed in so the screen is divided in two: Screenshot 2021-09-28 at 18 03 07

(or am I missing something?)

MX682X commented 3 years ago

(or am I missing something?)

yes, the previous picture had a smaller ampitude grafik

Is it possible to zoom into theese? I wonder why the oscilloscope says NACK...

grafik

mrWheel commented 3 years ago

yes, the previous picture had a smaller ampitude !

In the picture with the “lower amplitude” the screen is horizontal divided in two. So the amplitude is in real twice the size in the picture..

Is it possible to zoom into theese? I wonder why the oscilloscope says NACK...

The time base is 100ms while the time base with the more readable data is 2ms. I think the “spike” is just compressed data. The decoder is not capable to decode that compressed data.

MX682X commented 3 years ago

In the picture with the “lower amplitude” the screen is horizontal divided in two. So the amplitude is in real twice the size in the picture..

If it were the case, you'd have more then 6V on SDA. Also, the SCL Line shows the 5V correctly

SpenceKonde commented 3 years ago

@mrWheel Regardless of which line it is and what the specific voltage is, why are they not both going up to the same value? I am hard pressed to think of any situation where that is possible with correct wiring and good connections... One of those two, when idle, looks to be getting pulled up to the supply rail. The other, since the voltage is slightly different.... is getting pulled up to.....? a different supply rail?! What? The 1.8v SCL hang is weird too. With a 2.2k pullup to... 5v? on the line, 3.2v drop.... something was pulling 5.1mA? which is weird - or it was getting pulled up to 1.8v and being seen as LOW due to the rail it was connected to not getting the correct voltage? (hmm, and it persisted across all sorts of attempts to revive.... and as you're repeatedly trying to reupload, suddenly it comes back to life? Loose wires? cold solder joints? crap cables? do any of the scope traces jump around while handling, poking, or tapping it? different voltages at different points on the supply rail due to poor connections or hair-thin wire used to supply power? (If there's dupont line (particularly the kind that was made from ribbon cable - that stuff is bad news) or stuff that was crimped at home using an inappropriate crimp tool) involved, or a breadboard (, those are all things to look at. Dodgy sockets or connectors?

In general: Wire.beginTransaction() doesn't have any while loops in it. That shouldn't be able to hang.

@JDJelectronics: thanks for the further info. 12v lead acid battery buck'ed down to 5v.... For long battery life at low cost and and/or concern of battery fire? The original reason I'd asked was to get an idea of your power budget, wondering if BOD was the answer, but if a software reset triggers it, that's not solving the right problem.

So you say that a currently working unit with a full battery failed when you triggered a software reset? That would seem to rule out the theory that it's simply triggered by low voltage, and instead imply that there's an issue that manifests whenever individual parts power on or reset in the wrong order.

With software reset triggered failure, are the data lines being held low? What makes it work again after the failure is triggered?

SpenceKonde commented 3 years ago

@JDJelectronics - to add to above, I recommend to you the same trick of disconnecting I2C without power is any lines are being held low,, to find out what device has malfunctioned.

In your case, if I had to guess,... i'd say the order that things are powering on is relevant, or something breaks when you try to init the second time.