SpenceKonde / DxCore

Arduino core for AVR DA, DB, DD, EA and future DU-series parts - Microchip's latest and greatest AVRs. Library maintainers: Porting help and adviccee is available.
Other
187 stars 49 forks source link

i2c slave corruption in sleep modes other than SLEEP_MODE_IDLE #322

Closed ObviousInRetrospect closed 1 year ago

ObviousInRetrospect commented 2 years ago

I've noticed unreliable slave behavior in SLEEP_MODE_STANDBY (and SLEEP_MODE_POWER_DOWN). This seems to be more of an issue when transferring larger buffers. I thought guarding the sleep_cpu() call with !Wire.slaveTransactionOpen() was sufficient. The twi supports wake from STANDBY and even POWER_DOWN but every i2c slave I have written with DxCore and megaTinyCore has been flakey unless set for SLEEP_MODE_IDLE.

The code here: https://github.com/ObviousInRetrospect/DualMode/tree/main/DualModeExample

works as expected

but if SLEEP_MODE_IDLE is changed to SLEEP_MODE_STANDBY the slave starts corrupting data: https://github.com/ObviousInRetrospect/DualMode/blob/10dfca6b5bc6c8fedf1605d2d8c5981e2e4aca34/DualModeExample/DualModeExample.ino#L319

this is most easily seen (especially without an ina3221) by uncommenting the test pattern data fill (L184-188): https://github.com/ObviousInRetrospect/DualMode/blob/10dfca6b5bc6c8fedf1605d2d8c5981e2e4aca34/DualModeExample/DualModeExample.ino#L184

When I raised this in https://github.com/SpenceKonde/DxCore/discussions/316 @MX682X suggested opening a separate issue with more details.

The output on the master https://github.com/ObviousInRetrospect/DualMode/tree/main/DualModeExampleClient in SLEEP_MODE_IDLE is a long string of correct reads:

(the master does a 32-byte read follow by a 25-byte read. the last 4 bytes are a crc32 covering the entire data)

0000000000000000000000000C000000
000033B9A8020000000000000C000000
000033B9000B01000C0000000C0033B9
010033B90078CF670F
ch1 bv:0 sv:0 ua:0 acc0:0/12=0.00mah acc1:0/47411=0.00mah
ch2 bv:680 sv:0 ua:0 acc0:0/12=0.00mah acc1:0/47411=0.00mah
ch3 bv:2816 sv:1 ua:400 acc0:12/12=0.00mah acc1:112947/47411=5.16mah

0000000000000000000000000C000000
00003FB9A0020000000000000C000000
00003FB9000B01000C0000000C003FB9
01003FB900713F2E0F
ch1 bv:0 sv:0 ua:0 acc0:0/12=0.00mah acc1:0/47423=0.00mah
ch2 bv:672 sv:0 ua:0 acc0:0/12=0.00mah acc1:0/47423=0.00mah
ch3 bv:2816 sv:1 ua:400 acc0:12/12=0.00mah acc1:112959/47423=5.16mah

omitting thousands of similar lines with very few errors. In an overnight run ran this overnight, 99.2% of transfers were clean, 0.8% corruption due to race condition around crc calculation where data looks fine but the data changed during crc calculation.

when the slave is changed to SLEEP_MODE_STANDBY, the output changes to:

[content in square brackets are comments I added]

[this is the slave being updi programmed and not responding]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFF
ch1 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah
ch2 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah
ch3 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah

[the first read succeeds]
00000000000000000000000006000000
00000600F00200000000000006000000
00000600F80A01000600000006000600
0000060000CEC895DF
ch1 bv:0 sv:0 ua:0 acc0:0/6=0.00mah acc1:0/6=0.00mah
ch2 bv:752 sv:0 ua:0 acc0:0/6=0.00mah acc1:0/6=0.00mah
ch3 bv:2808 sv:1 ua:400 acc0:6/6=0.00mah acc1:6/6=0.00mah

[subsequent read starts failing. the second line is pretty clearly a fourth line given the CRC looking thing in the middle followed by FFFFFs which are an underrun]
rcv_crc:38ACA57E calc:A3E7FCC4
bad crc, will retry 2 times
00001300F80A01001300000013001300
00001300007EA5AC38FFFFFFFFFFFFFF
00001300F80A01001300000013001300
00001300007EA5AC38
rcv_crc:38ACA57E calc:A3E7FCC4
bad crc, will retry 1 times
00001300F80A01001300000013001300
00001300007EA5AC38FFFFFFFFFFFFFF
00001300F80A01001300000013001300
00001300007EA5AC38

00001300F80A01001300000013001300
00001300007EA5AC38FFFFFFFFFFFFFF
00001300F80A01001300000013001300
00001300007EA5AC38
ch1 bv:2808 sv:1 ua:400 acc0:19/19=0.00mah acc1:19/19=0.00mah
ch2 bv:32256 sv:-21339 ua:-15920 acc0:-200/65535=-0.01mah acc1:65535/19=2.99mah
ch3 bv:2808 sv:1 ua:400 acc0:19/19=0.00mah acc1:19/19=0.00mah
rcv_crc:9B14924C calc:2ED9D8AC
bad crc, will retry 2 times
00001F00F80A01001F0000001F001F00
00001F00004C92149BFFFFFFFFFFFFFF
00001F00F80A01001F0000001F001F00
00001F00004C92149B
rcv_crc:9B14924C calc:2ED9D8AC
bad crc, will retry 1 times
00001F00F80A01001F0000001F001F00
00001F00004C92149BFFFFFFFFFFFFFF
00001F00F80A01001F0000001F001F00
00001F00004C92149B

00001F00F80A01001F0000001F001F00
00001F00004C92149BFFFFFFFFFFFFFF
00001F00F80A01001F0000001F001F00
00001F00004C92149B
ch1 bv:2808 sv:1 ua:400 acc0:31/31=0.00mah acc1:31/31=0.00mah
ch2 bv:19456 sv:5266 ua:9248 acc0:-101/65535=-0.00mah acc1:65535/31=2.99mah
ch3 bv:2808 sv:1 ua:400 acc0:31/31=0.00mah acc1:31/31=0.00mah
rcv_crc:E2A3E693 calc:64CCCBD4
bad crc, will retry 2 times
00002A00F80A01002A0000002A002A00
00002A000093E6A3E2FFFFFFFFFFFFFF
00002A00F80A01002A0000002A002A00
00002A000093E6A3E2
rcv_crc:E2A3E693 calc:64CCCBD4
bad crc, will retry 1 times
00002A00F80A01002A0000002A002A00
00002A000093E6A3E2FFFFFFFFFFFFFF
00002A00F80A01002A0000002A002A00
00002A000093E6A3E2

00002A00F80A01002A0000002A002A00
00002A000093E6A3E2FFFFFFFFFFFFFF
00002A00F80A01002A0000002A002A00
00002A000093E6A3E2
ch1 bv:2808 sv:1 ua:400 acc0:42/42=0.00mah acc1:42/42=0.00mah
ch2 bv:37632 sv:-23578 ua:5984 acc0:-30/65535=-0.00mah acc1:65535/42=2.99mah
ch3 bv:2808 sv:1 ua:400 acc0:42/42=0.00mah acc1:42/42=0.00mah
rcv_crc:36000000 calc:E4E62545
bad crc, will retry 2 times
0B0036000000360000AAD47928FFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
00000C00000000003600F00200000000
000036000000000036
rcv_crc:00000002 calc:CBBEEF8E
bad crc, will retry 1 times
FFFFFFFF00000000FFFF00000D000000
00003700F00200000000000037000000
FFFFFFFF00000000FFFF00000D000000
00003700F002000000

FFFFFFFF00000000FFFF00000D000000
00003700F00200000000000037000000
FFFFFFFF00000000FFFF00000D000000
00003700F002000000
ch1 bv:0 sv:0 ua:0 acc0:65535/13=2.99mah acc1:0/55=0.00mah
ch2 bv:752 sv:0 ua:0 acc0:0/55=0.00mah acc1:-65536/65535=-2.99mah
ch3 bv:0 sv:0 ua:0 acc0:65535/13=2.99mah acc1:0/55=0.00mah

FFFFFFFF00000000FFFF000018000000
00004200F00200000000000042000000
00004200F80A01001700000017004200
000042000060EC0D24
ch1 bv:0 sv:0 ua:0 acc0:65535/24=2.99mah acc1:0/66=0.00mah
ch2 bv:752 sv:0 ua:0 acc0:0/66=0.00mah acc1:0/66=0.00mah
ch3 bv:2808 sv:1 ua:400 acc0:23/23=0.00mah acc1:66/66=0.00mah
rcv_crc:149A5594 calc:969842F3
bad crc, will retry 2 times
FFFFFFFF00000000FFFF000024000000
00004E00F0020000000000004E000000
00004F00F80A01002400000024004F00
00004F000094559A14
rcv_crc:149A5594 calc:464D6721
bad crc, will retry 1 times
00004F00F80A01002400000024004F00
00004F000094559A14FFFFFFFFFFFFFF
00004F00F80A01002400000024004F00
00004F000094559A14

00004F00F80A01002400000024004F00
00004F000094559A14FFFFFFFFFFFFFF
00004F00F80A01002400000024004F00
00004F000094559A14
ch1 bv:2808 sv:1 ua:400 acc0:36/36=0.00mah acc1:79/79=0.00mah
ch2 bv:37888 sv:-26027 ua:9424 acc0:-236/65535=-0.01mah acc1:65535/79=2.99mah
ch3 bv:2808 sv:1 ua:400 acc0:36/36=0.00mah acc1:79/79=0.00mah

This is with test pattern data in which each byte contains its own address

[expected, device being programmed and not responding]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFF
ch1 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah
ch2 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah
ch3 bv:65535 sv:-1 ua:-400 acc0:-1/65535=-0.00mah acc1:-1/65535=-0.00mah

[3 correct transmissions, varies]
000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

[corruption starts, this one isn't an obvious pattern]
rcv_crc:2221201F calc:C272FEB8
bad crc, will retry 2 times
2C2D2E2F3031323334A5C3A244FFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0A0B0C0D0E0F10111213141516171819
1A1B1C1D1E1F202122

[]
rcv_crc:44A2C3A5 calc:3F00FCF8
bad crc, will retry 1 times
0A0B0C0D0E0F10111213141516171819
1A1B1C1D1E1F20212223242526272829
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244

0A0B0C0D0E0F10111213141516171819
1A1B1C1D1E1F20212223242526272829
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:3854 sv:4368 ua:-22272 acc0:353637138/5910=16149.43mah acc1:454695192/7452=20764.41mah
ch2 bv:7966 sv:8480 ua:-15872 acc0:623125282/10022=28456.06mah acc1:555755816/8994=25379.52mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

[back to a couple correct ones]
000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

[but now its aliasing the first transmission]
rcv_crc:18171615 calc:E8257576
bad crc, will retry 2 times
000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
000102030405060708090A0B0C0D0E0F
101112131415161718

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah
rcv_crc:18171615 calc:E8257576
bad crc, will retry 2 times
000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
000102030405060708090A0B0C0D0E0F
101112131415161718

000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
202122232425262728292A2B2C2D2E2F
3031323334A5C3A244
ch1 bv:1284 sv:1798 ua:-1696 acc0:185207048/3340=8457.79mah acc1:286265102/4882=13072.77mah
ch2 bv:5396 sv:5910 ua:4704 acc0:454695192/7452=20764.41mah acc1:555753246/8994=25379.40mah
ch3 bv:9508 sv:10022 ua:11104 acc0:724183336/11564=33071.04mah acc1:825241390/13106=37686.02mah
ObviousInRetrospect commented 2 years ago

ok so in addition to that million I ran another 6.3m with mTC 2.6.1 + RC2 as the master:

CRC valid
fail: 0 pass: 6367884
SpenceKonde commented 1 year ago

Dear god. Okay, so I think I have your fixes in MTC for 2.6.2, and that will be ported to DxCore 1.5.0