Loop crash (current master)

hpc4fun commented 2 years ago

The other day I had a loop app that crashed when one tried to open it. The symptoms looked very much like the one described here: https://loopkit.github.io/loopdocs/troubleshooting/loop-crashing/ but from the loop issue report we see:

Version: Loop v2.2.6
profileExpiration: 2022-12-08 18:54:26 +0000
gitRevision: e81c5afe98d4f306a5ae13369a7d4ac3f3aca0c3
gitBranch: N/A
buildDateString: Ons 8 Dec 2021 19:57:47 CET
xcodeVersion: 13A1030d

so the issue is not due to an expired profiled. I managed to regain access to the loop app and the running pod by rebuilding the loop app:

Version: Loop v2.2.6
profileExpiration: 2023-01-12 19:13:14 +0000
gitRevision: e81c5afe98d4f306a5ae13369a7d4ac3f3aca0c3
gitBranch: N/A
buildDateString: Ons 19 Jan 2022 13:47:38 CET
xcodeVersion: 13A1030d

The running pod was not lost and I continued running with it. The pod has been changed since but the loop app has not crashed again.

Now, today (or rather during the night :) another T1DM running loop had the same kind of crash. The loop app was crashing when one tried to open it and there was no way to talk to the app nor the pod. I thought the fix was easy. Simply redo the process above but alas it turned out that there was more to it this time.

I did rebuild and reinstall again. This time from:

Version: Loop v2.2.6
profileExpiration: 2022-12-10 21:41:27 +0000
gitRevision: e81c5afe98d4f306a5ae13369a7d4ac3f3aca0c3
gitBranch: N/A
buildDateString: Fre 10 Dec 2021 22:49:46 CET
xcodeVersion: 13A1030d

to

Version: Loop v2.2.6
profileExpiration: 2023-01-12 19:13:14 +0000
gitRevision: e81c5afe98d4f306a5ae13369a7d4ac3f3aca0c3
gitBranch: N/A
buildDateString: Lør 22 Jan 2022 02:45:18 CET
xcodeVersion: 13A1030d

Again this trick allowed me to regain control of the app (I could now start it and get an issue report out). Unfortunately this time, the loop app would crashes on a regular basis and the predicted BG is NaN. Regular in the sense that it was easy to make it crash. I trust that the crash is due to the fact that there are NaNs in internal data structures (for some reason I don't understand) and when these are being used in computations it will crash, e.g. I saw one of the RL subs that was trying to convert this double to int and consequently crashed in trying to do this conversion.

In the issue report I see the NaNs too:

appGroupName: group.com.FT54B2L35J.loopkit.LoopGroup statusExtensionContext: Optional(["predictedGlucose": ["interval": 300.0, "unit": "mmol<180.1558800000541>/L", "values": [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], "startDate": 2022-01-22 01:50:00 +0000], "reservoirCapacity": 200.0, "netBasal": ["percentage": 0.0, "start": 2022-01-22 01:05:42 +0000, "rate": 0.0], "lastLoopCompleted": 2022-01-22 01:18:27 +0000, "pumpManagerHUDViewsContext": ["pumpManagerHUDViewsRawValue": ["hudProviderViews": { alerts = ( ); lifetime = "259146.8027609587"; podActivatedAt = "2022-01-21 16:56:01 +0000"; validTime = "2022-01-22 01:49:31 +0000"; }, "managerIdentifier": Omnipod]], "version": 5])

If I try to enter any of the menus (blood glucose, active insulin, ...) it consistently crashes the loop app.

Fortunately, I can still bolus and the basal seems to run OK so I can get through the night in open loop mode but a proper fix is obviously needed in order to continue.

Phone in the first case is latest SE model whereas it's the older SE model in the second case. Both phones run ios 15.2 and there is sufficient storage on both phones.

ANY feedback or ideas on how to proceed from here is highly appreciated.

Loop Report 2022-01-22 02.49.41+01.00.md

PS! I should also mention that in the first case (the best case) the pod was on its third day whereas in the second case the pod was activated the day before, i.e. had less than 24 hours of operation when the Loop app crashed.

hpc4fun commented 2 years ago

More info: In the builds above I did clean the build folder before each rebuild and for the latter case the process was repeated a couple of time (clean build folder, rebuild, install).

I took the time to report the issue above and while doing that I wiped the clone and did a fresh clone (I did the same for the first phone above but not for the second one where I assumed that a clean build folder was sufficient). It seems that it wasn't sufficient as a rebuild from a fresh clone apparently made the NaNs disappear and the Loop crashes has now stopped (as least for now :)).

One may speculate if the NaNs simply disappeared due to the time that has went by since I started to debug this (as the arrays are finite one could imagine that they would be filled with sane values eventually) or was the true fix to re-clone and start from scratch. That I don't know.

The Loop app has been rock solid for years (several versions) so the main issue here is why do the apps suddenly crash out of the blue and maybe not so much the hassle to get the communication with the pod reestablished. It shall be stressed that this is not the dev version but the current master so the sudden crashes come as a true surprise to me. It should also be mentioned that I have issue reports from before (previous pods) and after the crashes on both the cases described above so just reach out if more details would be of interest.

marionbarker commented 2 years ago

Thanks for posting. I’ll try to get to your logs later this weekend. FYI, the logs include pod messages exchanged over the previous 48 hours.

hpc4fun commented 2 years ago

Now after some sleep, here are a few more observations from the issue report:

1) Normally I see configuredDevices being a string "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" but in both cases above, when I re-establish contact to the app the issue report shows that the configuredDevices is empty:

Loop Report 2022-01-19 13.48.16+01.00.md:configuredDevices: []
Loop Report 2022-01-22 02.36.31+01.00.md:configuredDevices: []
Loop Report 2022-01-22 02.49.41+01.00.md':configuredDevices: []  # the file attached above

And now when things (at least on the surface appear to work again), I see that the configuredDevices is suddenly set to a string of the usual format again (still on the same pod, that is!). Not sure if this missing string caused the initial crash of the app or if this is a result of the rebuild/reinstall exercise. I trust that I have see this one in the past too so could be totally unrelated to the crash.

2) The StatusExtensionDataManager usually contain a predictedGlucose array etc. but in this case this is missing completely and assumptions on the presence of this in the logic could explain the crashes. In the 02.36.31+01.00 report we see:

appGroupName: group.com.FT54B2L35J.loopkit.LoopGroup
statusExtensionContext: Optional(["netBasal": ["rate": 0.0, "percentage": 0.0, "start": 2022-01-22 01:05:42 +0000], "reservoirCapacity": 200.0, "lastLoopCompleted": 2022-01-22 01:18:27 +0000, "pumpManagerHUDViewsContext": ["pumpManagerHUDViewsRawValue": ["hudProviderViews": {
    alerts =     (
    );
    lifetime = "259148.7585179806";
    podActivatedAt = "2022-01-21 16:56:01 +0000";
    validTime = "2022-01-22 01:21:10 +0000";
}, "managerIdentifier": Omnipod]], "version": 5])

which in the 02.49 report (the one attached) has evolved into:

StatusExtensionDataManager
appGroupName: group.com.FT54B2L35J.loopkit.LoopGroup
statusExtensionContext: Optional(["predictedGlucose": ["interval": 300.0, "unit": "mmol<180.1558800000541>/L", "values": [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], "startDate": 2022-01-22 01:50:00 +0000], "reservoirCapacity": 200.0, "netBasal": ["percentage": 0.0, "start": 2022-01-22 01:05:42 +0000, "rate": 0.0], "lastLoopCompleted": 2022-01-22 01:18:27 +0000, "pumpManagerHUDViewsContext": ["pumpManagerHUDViewsRawValue": ["hudProviderViews": {
    alerts =     (
    );
    lifetime = "259146.8027609587";
    podActivatedAt = "2022-01-21 16:56:01 +0000";
    validTime = "2022-01-22 01:49:31 +0000";
}, "managerIdentifier": Omnipod]], "version": 5])

In the LoopDataManager section I also see several empty arrays in the 02.36 report which potentially could cause issues to the logic:

insulinEffect: [
* GlucoseEffect(start, mg/dL)

] 
carbEffect: [
* GlucoseEffect(start, mg/dL)

] 
predictedGlucose: [
* PredictedGlucoseValue(start, mg/dL)

] 
retrospectiveGlucoseDiscrepancies: [
* GlucoseEffect(start, mg/dL)

] 
retrospectiveGlucoseDiscrepanciesSummed: [
* GlucoseChange(start, end, mg/dL)

]

These evolve into this (which could still explain crashes of course) in the 02.49 report:

carbEffect: [
* GlucoseEffect(start, mg/dL)
* 2022-01-22 01:15:00 +0000, inf
* 2022-01-22 01:20:00 +0000, inf
* 2022-01-22 01:25:00 +0000, inf
* 2022-01-22 01:30:00 +0000, inf
* 2022-01-22 01:35:00 +0000, inf
* 2022-01-22 01:40:00 +0000, inf
* 2022-01-22 01:45:00 +0000, inf
....

predictedGlucose: [
* PredictedGlucoseValue(start, mg/dL)
* 2022-01-22 01:46:00 +0000, 167.54496840005032
* 2022-01-22 01:50:00 +0000, nan
* 2022-01-22 01:55:00 +0000, nan
* 2022-01-22 02:00:00 +0000, nan
* 2022-01-22 02:05:00 +0000, nan
* 2022-01-22 02:10:00 +0000, nan
* 2022-01-22 02:15:00 +0000, nan
* 2022-01-22 02:20:00 +0000, nan
....

]
retrospectiveGlucoseDiscrepancies: [
* GlucoseEffect(start, mg/dL)
* 2022-01-22 01:15:10 +0000, nan

]
retrospectiveGlucoseDiscrepanciesSummed: [
* GlucoseChange(start, end, mg/dL)
* 2022-01-22 01:15:10 +0000, 2022-01-22 01:15:10 +0000, nan

]

In the communication log leading up to the crash, things look totally normal to me so I don't think the crash has to do with communication to neither the Pod nor the CGM. It seems more like a generic loop issue (or an issue further down the IOS stack, both phones are on IOS 15.2).

ps2 commented 2 years ago

If you can reproduce the crash while connected to the debugger, that would give us a stack trace, that may pinpoint the issue.

marionbarker commented 2 years ago

Pete - This might not be relevant, but it might help. I went searching for what is known in the FreeAPS world as the time zone bug. The user can select a "Carb Badge" option that tells them how many carbs they should eat to prevent a low. This is something Ivan added a long time ago. The problem is that if the timezone of the pump doesn't match the phone and the user modifies an ISF setting or a CR setting, then the app crashes. I identified the crash as being at this line when the ratio of those two settings is generated:

LoopWorkspace/LoopKit/LoopKit/DailyValueSchedule.swift: Lines 176-177

public static func zip<L, R>(_ lhs: DailyValueSchedule<L>, _ rhs: DailyValueSchedule<R>) -> DailyValueSchedule where T == (L, R) {
    precondition(lhs.timeZone == rhs.timeZone)

The precondition fails. (The entry gets converted from phone time to pump time somewhere later in the loop cycle - so at the call to the zip - the time zones don't match.)

That's as far as I got - I wasn't sure what to do next and got busy with another project about then.

I'm wondering if for this situation, there may be a low-level call to divide two arrays that needs to have zero values checked prior to the call - leading to the inf and nan results.

ps2 commented 2 years ago

A stack trace would probably shine some good light on this. I've heard of the FreeAPS TimeZone thing; I think that was someone copying code into FreeAPS or someone working on FreeAPS not understanding TimeZone handling in Loop. Loop (main) has very well thought out Timezone management and has largely been free of the sort of timezone issues that seem to plague OpenAPS and other systems.

ps2 commented 1 year ago

If this is still reproducible, please attach a crash report, found in Settings -> Privacy & Security -> Analytics & Improvements -> Analytics Data -> Loop-YYYY-MM-DD-HHMMSS.ips

LoopKit / Loop

Loop crash (current master) #1590