UWCubeSat / DubSat1

Main repository for all flight software (including all subsystems, core libraries, and hardware abstraction layers) for the DubSat-1 3U Cubesat.
22 stars 10 forks source link

Batt board gets into a weird I2C state #225

Closed nkwacker closed 5 years ago

nkwacker commented 5 years ago

Batt board reports only the same 2 or 3 hex values for readings over I2C. The issue can be fixed inconsistently through a restart, but happens after in indeterminate amount of time running.

nkwacker commented 5 years ago

Could not replicate the issue by a continuous run of one week, so switched to cycling the power every 8 seconds (the maximum that we could achieve on orbit)

nkwacker commented 5 years ago

Was able to observe the state both instantaneously

instant

and over the course of a few restarts.

sustained

Both graphs have one read per cycle

nkwacker commented 5 years ago

When the batt board is in the state, there is no I2C activity

nkwacker commented 5 years ago

After cycling from a bad state, the i2c bus goes through a series of NAKs and ACKs before finally resuming normal operation. This behavior is not present when cycling from a normal to a normal state. image

nkwacker commented 5 years ago

I can consistently replicate the issue by setting the reset bit of UCBxCTLW0 high before a read or write, leaving I2C in a "reset state". I suspect that this was causing the issue, as initializing the I2C library left this bit high, which I changed in 90d33a5e3f8834fd59290c174e1c4019464f60c4. Before the change, the library would enter i2cCoreRead() with the reset bit high, but then it would be set low when initializing autostop. https://github.com/UWCubeSat/DubSat1/blob/90d33a5e3f8834fd59290c174e1c4019464f60c4/src/dsbase/core/i2c.c#L120-L125 Note that i2cEnable(bus); is the line where reset is set low.

I suspect this is also the issue that caused i2cMasterCombinedWriteRead() to break since it does not initialize autostop, though there is no time to test if that is the case.

I am running another continuous-cycling test (the same one I used above) and if there is no error in ~36 hours I will close this issue

nkwacker commented 5 years ago

The failure is still happening, but hopefully I can use my earlier findings to more easily diagnose and solve what is going on

nkwacker commented 5 years ago

The bus hanging state seems to resolve itself after the SDA line is manually grabbed and pulled high, then control returned to the USCI. I added this workaround before the registers are initialized in bspI2CInit() and it seems to be able to consistently recover a bus from a bad state. (62bb41742adf6121998e332782f418570e0e68ec)

As before, I will run this new code on batt board overnight, then close this issue if there are no errors.

nkwacker commented 5 years ago

That seems to have done it