corona-warn-app / cwa-app-ios

Native iOS app using the exposure notification framework from Apple. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
1.68k stars 285 forks source link

No exposure check for 36 hours #3352

Closed ndegendogo closed 2 years ago

ndegendogo commented 3 years ago

Avoid duplicates

Technical details

Describe the bug

Yesterday morning I played a bit with switching WLAN on and off from the "Exposure Check Active" dialog - I tried if I can reproduce #3188 . First I could not reproduce it, but later at 10:17 I suddenly saw the error message when I opened cwa to add entries to my contact diary. See this comment and related screenshot. And then today evening I got the "dead man notification" - and, indeed, my ENF log confirms this; it shows no risk check in the last 36 hours, since my experiments.

The ENF settings indicate contact tracing active and linked to cwa Germany.

Steps to reproduce the issue

Expected behaviour

Regular and robust risk checks from background as well as during app usage, and from WLAN as well as from mobile connection.

Possible Fix

Additional context

At least we have evidence that the dead man notification is working well ... πŸ™ˆ

IMG_1192

IMG_1194

IMG_1193


Internal Tracking-ID: EXPOSUREAPP-9086

ndegendogo commented 3 years ago

More screenshots: when I open the app now, I get again / still the error message. The in-app settings show everything is active.

IMG_1195

IMG_1196

ndegendogo commented 3 years ago

A few lines from my log file: has a risk check started but never finished, and this blocks now more risk checks?

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:98] [requestRisk(userInitiated:timeoutInterval:)] RiskProvider: Request risk was called. UserInitiated: false

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:101] [requestRisk(userInitiated:timeoutInterval:)] RiskProvider: Risk detection is already running. Don't start new risk detection.

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:479] [failOnTargetQueue(error:updateState:)] RiskProvider: Failed with error: riskProviderIsRunning

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:98] [requestRisk(userInitiated:timeoutInterval:)] RiskProvider: Request risk was called. UserInitiated: false

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:101] [requestRisk(userInitiated:timeoutInterval:)] RiskProvider: Risk detection is already running. Don't start new risk detection.

Info 2021-08-11T20:02:52Z [ENA/RiskProvider.swift:479] [failOnTargetQueue(error:updateState:)] RiskProvider: Failed with error: riskProviderIsRunning

ndegendogo commented 3 years ago

it looks as if the HTTP session was crashed and is now blocking everything...?

Info 2021-08-11T20:02:52Z [ENA/PPAnalyticsSubmitter.swift:61] [triggerSubmitData(ppacToken:completion:)] Analytics submission was triggered. Checking now if we can submit...

Warning 2021-08-11T20:02:52Z [ENA/PPAnalyticsSubmitter.swift:65] [triggerSubmitData(ppacToken:completion:)] Analytics submission abord due to submission is already in progress

Debug 2021-08-11T20:02:52Z [ENA/CachedAppConfiguration.swift:223] [shouldFetch()] timestamp >= 300s? 132255.60344099998 >= 300)

Debug 2021-08-11T20:02:52Z [ENA/CachedAppConfiguration.swift:189] [appConfiguration(forceFetch:)] fetching fresh app configuration. forceFetch: false, force: true

Debug 2021-08-11T20:02:52Z [ENA/CachedAppConfiguration.swift:76] [getAppConfig(with:)] Return immediately because request allready running.

Debug 2021-08-11T20:02:52Z [ENA/CachedAppConfiguration.swift:77] [getAppConfig(with:)] Append promise.

ndegendogo commented 3 years ago

I try to press the button but nothing happens (note: in the log file I see that it refused to start the risk check because it thinks the check is already running)

ndegendogo commented 3 years ago

I exit the app and kick it out of memory (swipe), then I reboot the phone. This seems to do the trick - risk check is now working again πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰

Ein-Tim commented 3 years ago

Okay, let me summarize to make sure I understood everything correctly.

You tried to reproduce #3188 but failed. Afterwards you opened the CWA and saw this error message on the risk card. At this time, did you have an active internet connection? After you added the entry to the contact journal you left the app. Did you hard close the app or just press on the home button? Then you did not open the app anymore until you received the dead man notification today - after 36h without a successful risk check. When you opened the app today, did you have an active internet connection? If yes, did the app showed the typical "Downloading" card? Or did it just stick to the message "Exposure check failed"? Pressing on "Erneut starten" did not have any influence. Have you tried going to a submenu in the app and returning back to the Home Screen (as I did in the video in the OP of #3188). Does the ENF log show any new check for this time before the reboot?


I interpret the log that way that the data donation submission failed and this blocked everything? Do you agree with me here? Interesting would be if you can reproduce this if you turn off data donation. Then nothing should be submitted anymore.


Lemme know in case I have mixed something up or misunderstood something. Thanks and have a good night!

ndegendogo commented 3 years ago

@Ein-Tim thanks for your interest to dig into this ...

You tried to reproduce #3188 but failed.

Yes. Let me add that I was on public transport during my experiments. So no WLAN coverage all the time, and possibly bad or instable network coverage.

Afterwards you opened the CWA and saw this error message on the risk card. At this time, did you have an active internet connection?

Yes, I opened cwa after arriving at work. No WLAN (I did not yet enter their new password since they changed it). My screenshot confirms this, and shows also a bit of mobile coverage (one bar LTE, usually this is enough for surfing through the internet).

After you added the entry to the contact journal you left the app. Did you hard close the app or just press on the home button?

Yes. I just pressed the home button. I never hard-close it, except in emergencies (like yesterday evening).

Then you did not open the app anymore until you received the dead man notification today - after 36h without a successful risk check.

No. I opened it several times over the day and added more collegues to the diary as I met them. In the evening I opened it again and checked the statistics. And the following day again several times for the diary. So, definitely, the app was up and running. Only I did not notice that the risk check was outdated. So I am very thankful for the notification.

When you opened the app today, did you have an active internet connection?

"Today" means before or after the deadman notification? Opened for diary: no WLAN, but mobile coverage. Opened for statistics in the evening: WLAN yes; mobile coverage: don't remember, possibly weak. My understanding of intended behaviour is: daily risk check, even without WLAN, and additionally checks with a 4-hour period in case of WLAN. And this is the pattern I see for the other days.

If yes, did the app showed the typical "Downloading" card? Or did it just stick to the message "Exposure check failed"?

Tbh, I don't remember. I was focussed on the diary resp the statistics numbers, and did not look at the risk. (I didn't expext anything interesting there, luckily our local incidence is still low 😁)

Pressing on "Erneut starten" did not have any influence.

Well - not any visible influence. In the log file I see very clearly that it indeed reacted to my pressing the button - only it stopped very fast.

Have you tried going to a submenu in the app and returning back to the Home Screen (as I did in the video in the OP of #3188).

Any submenu: yes, I activated logging 😎 This specific submenu: I don't remember, so I guess: No.

Does the ENF log show any new check for this time before the reboot?

No. After reboot I got a check, but before the ENF log shows no check since 36 hours. And cwa log confirms that every attempt for risk check is aborted early because it thinks there is already a risk check running.

ndegendogo commented 3 years ago

@Ein-Tim

I interpret the log that way that the data donation submission failed and this blocked everything? Do you agree with me here?

Actually, my interpretation of the log file is: "something" crashed and left the risk management object in an inconsistent state. My main suspect for "something" is a bad or instable internet connection, and a missing or incomplete exception handling or timeout handling on cwa side in this case. Maybe indeed the data donation. Maybe the fact that I played with this "Exposure logging active" dialog. Maybe just the fact that I played with cwa at all while my internet was not stable. And maybe I am even wrong with my suspicion, and the true root cause is something different...

Interesting would be if you can reproduce this if you turn off data donation. Then nothing should be submitted anymore.

Lets see ... I don't believe this is related to any background mode. So maybe it is even reproducible in the simulator ...

ndegendogo commented 3 years ago

Before I close this ticket (because for me the risk check works again), I would like two action items:

1) a Jira ticket to make the devs aware that there is still a path that can lead to inconsistent state in the risk provider and this can block the risk check @dsarkar @heinezen

2) a discussion / assessment if the suggestions how to act in case of dead man notification could/should be extended. In my case the suggestion on the notification was to open the app; and I didn't find any related FAQ entry (but maybe I just didn't search hard enough). The suggestion could be (a) to open cwa, and (b) in case this does not help, to close and hard-kill it. @Ein-Tim @dsarkar @heinezen

Ein-Tim commented 3 years ago

@ndegendogo

Thanks for your long and detailed comments, I will try to reproduce this (together with #3188) on my second device as soon as I have time. (:

Ein-Tim commented 3 years ago

I tried hard to reproduce this but wasn't able to, sorry πŸ˜•

ndegendogo commented 3 years ago

@Ein-Tim

I tried hard to reproduce this

I don't think this is easy / straightforward to reproduce. Else I would expect more tickets. 🀭

ndegendogo commented 3 years ago

I reproduced "something similar" on the simulator. I am uploading my log file.

Details:

ndegendogo commented 3 years ago

simu_log2.txt

heinezen commented 3 years ago

@Ein-Tim @ndegendogo

Sorry for coming back to this so late. The developers are trying to find the cause and a solution for this problem. I've forwarded the log file. Let's see what they can find out.


Corona-Warn-App Open Source Team

dsarkar commented 3 years ago

@ndegendogo FYI: Dev team became aware of a bug, exposure checks was heavily tested and successfully verified that it is working (now).

Some changes were introduced, see https://github.com/corona-warn-app/cwa-app-ios/pull/3362

heinezen commented 3 years ago

We are still looking into this issue, but there might be multiple problems that influence the surfacing of this bug.


Corona-Warn-App Open Source Team

thomasaugsten commented 3 years ago

Out investigation at the moment show that this white card is correct.

If you are checked into an event the app the app needs to check if the event was infectious or not. For this check the app needs to fetch the latest published event ids from the CDN. If this fetch is not possible no exposure check for this event is possible. The white card informs the user that the risk check was not successful

Ein-Tim commented 3 years ago

@thomasaugsten

Did you want to comment on #3188? Because here the issue was that the app doesn't carry out risk checks in the background. This is definitely not intended, or?

ndegendogo commented 3 years ago

Because here the issue was that the app doesn't carry out risk checks in the background.

Exactly. I didn't have any event checkin codes. I played with the ENF activation/ deactivation button while I was in unstable network coverage

Ein-Tim commented 2 years ago

Is this still reproducible on your device with version 2.16 and the latest iOS version @ndegendogo?

ndegendogo commented 2 years ago

What a coincidence! Today I come home, open cwa and expect a risk check like every day - and nothing happens! Just a green card, and the last check was in the morning. And, yes, during the day I had no WLAN coverage, but now I definitely have WLAN. So I waited another hour and tried again ... and now I have the white card with the error message. See screenshot. And the button has no effect.

Note: I am currently still on cwa 2.15.1 (2) / iOS 15.1 / iPhone 13 Pro

Next action will be to upgrade cwa to newest and try again.

879A2077-D65B-4E45-B797-44C49CD804D6

ndegendogo commented 2 years ago

Looks good ...

CBFC1CAA-33E9-4B90-8CC4-8379B4F85971

Ein-Tim commented 2 years ago

@ndegendogo Do you still need this issue open?

ndegendogo commented 2 years ago

@Ein-Tim thanks for the reminder. In last time my risk checks are absolutely reliably, and very fast; especially since the optimization (show only the new key files to ENF).

Closing as resolved.