corona-warn-app / cwa-app-android

Native Android app using the Apple/Google exposure notification API. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
2.44k stars 495 forks source link

Result "status" disappeared in the app after restarting it #1421

Closed mmsh85 closed 3 years ago

mmsh85 commented 4 years ago

Good Morning,

I first liked the issue to the QR code scanning and response problem; but I consider this issue is new and not the same.

Describe the bug

Result status disappeared from the app after restarting it. My son got tested on Thursday 15.10 We installed the app on his phone and scanned the QR code as soon as we returned.

OK -> After the scan; the first time the app showed that the "result pending" NOK ->After restarting; the scanned QR disappeared (app shows "Have you been tested?" screen)

Expected behaviour

After restarting; the scanned tan still should be appearing in the app showing the test status

Steps to reproduce the issue

Cannot reproduce as scanning only available once, but the procedure was:

Technical details

Additional context


Internal Tracking ID: EXPOSUREAPP-2126

thomasaugsten commented 4 years ago

Can you give some more background information in German would be also ok. Because we cannot reproduce this. Was the Risk Status Green or grey after the restart. What kind of restart you did. Can you rescan the QR code or is the code now invalid. Did you scan multiple QR code or remove tests?

mmsh85 commented 4 years ago

Was the Risk Status Green or grey after the restart. After restart the default screen was back, which is appearing now as well. See attachment. IMG_20201017_094705 When the scan was done though was just after downloading and before activating BT so I assume it was still grey as I activated BT afterwards.

What kind of restart you did. Via showing the running apps then swipe up CWA.

Can you rescan the QR code or is the code now invalid. Code is now invalid.

Did you scan multiple QR code or remove tests? No, only this one and cannot rescan it as stated above. Issue is that neither a test result is available nor rescanning is available as the QR code is used. It should be in the phone somehow, but not sure why it got disappeared

mmsh85 commented 4 years ago

One other note: Phone had no sim card as clear in the screenshot

thomasaugsten commented 4 years ago

An you please explain in detail the Flow of Scanning and activating BT and what Color the Risk Score Card has

mmsh85 commented 4 years ago

Hi Thomas,

I tried to replicate the process also on a second phone (surely the QR code scan part will not be possible):

Pre-Conditions:

Procedure in Details (as much as possible):

Hope this explains it all.

thomasaugsten commented 4 years ago

You can also try to reproduce this with a test qr working

frame

Test2: frame (1)

mmsh85 commented 4 years ago

Wasn't able to reproduce unfortunately, but at least I can recall the next steps now.

Note: Each QR code was scanned on a different phone, the second was scanned on the phone that reported the issue.

I can understand if not reproduced the case would be closed accordingly though.

vaubaehn commented 4 years ago

Hi @mmsh85 ,

I assume that the swipe up to close the app could be the critical point, and that it might not be reproducable always. Could you please re-open the issue that we can track it further from here, please? Thanks for your great contribution!

@d4rken FYI: editor.apply vs. editor.commit?

mmsh85 commented 4 years ago

@vaubaehn, @d4rken Ropened upon request.

If needed to support differently let me know too. I assume that unfortunately the first "correct" TAN has been sent, but now lost, if there is a way to confirm this at least then this would help, I assume :)

vaubaehn commented 4 years ago

@mmsh85 Thanks for re-opening. To give a more detailed insight to what I assume what could have happened:

Let's see, what the devs will find out :)

mmsh85 commented 4 years ago

.. and with a lost token; when the app restarts it detects no result. I would second that logical sequence, but why did it work with the test QR codes - timings then?

Issue is that irrespective to this, now I will wait till Monday to see the result status of my son :( Will be willing to support anyhow and thanks for the explanation!

vaubaehn commented 4 years ago

My assumption is, that the swipe up won't always kill the app before data are written. It would depend on how fast you are (cowboy 😉), and how fast is Android (kind of a shoot out...) If the device is slow, or Android was particularly busy with other tasks at that moment, chances are better, that task is killed before data could be written. This will make reproducing/debugging quite hard. There were 2 other users (https://github.com/corona-warn-app/cwa-app-android/issues/642#issuecomment-691938723 https://github.com/corona-warn-app/cwa-app-android/issues/579#issuecomment-674405478) who were in a similar logic, but different from you completely crashing the encrypted storage.

I hope everything will turn out fine with your son! Stay safe and healthy, all the best!

mmsh85 commented 4 years ago

@vaubaehn; thanks a-lot 😊

kolyaopahle commented 4 years ago

Hello @vaubaehn, thanks a ton for all the work into finding the root cause for this bug!! I personally have not heard of these kind of race conditions happening when calling commit() on shared preferences which is what we do on all of our interactions with LocalData. Can you confirm that you have seen this behaviour in those cases as well? We are currently working on refactoring big parts of the submission process and hope that if the issue is caused by code on our side these refactorings will fix the underlying issue. But if not, or if you can confirm this also happens with commit() calls we can investigate the shared prefs again.

vaubaehn commented 4 years ago

Hi @kolyaopahle ,

thanks a ton for all the work into finding the root cause for this bug!!

Sure, happy if I can help :heart:

I personally have not heard of these kind of race conditions happening when calling commit() on shared preferences

I would agree, that a race condition is quite unlikely when editing (Encrypted)SharedPreferences with edit.commit(). However, edit.apply() is recommended as best practice, leaving it to Android to store the data when it thinks it's a good time to do. I assume edit.apply() to be prone to a race condition, if the child task is killed before the data could have been written.

which is what we do on all of our interactions with LocalData.

I'm not sure if this is quite correct, that you use edit.commit() on all interactions with EncryptedSharedPrefs. One (time) critical example is how you store the sqlite-db-password: https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/util/security/SecurityHelper.kt#L90-L96 I must admit that I don't know how data for all other interactions are written. For me it looks like, with the declaration of https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/util/security/EncryptedPreferencesFactory.kt#L22-L28 it's up to tink on how to store data at any time? I couldn't find any hint, that write interactions with EncryptedSharedPrefs are using edit.commit(). If it's actually left to tink which method to use for storing, I assume it might be edit.apply(), as this is generally recommended as best practice (but might be too slow in our case). The only code fragment I found, that explicitly uses edit.commit(), is clearing the SharedPrefs: https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/util/security/SecurityHelper.kt#L82-L84 If there is any declaration in the code to use edit.commit() in general for interactions with EncryptedSharedPreferences, I'd be happy, if you could point me to there.

Any supporting information from you on this subject would be highly appreciated for my own understanding and learning.

Can you confirm that you have seen this behaviour in those cases as well?

Unfortunately I don't have any development environment or any other testing setup, I can just read user reports, investigate in the code, and try to connect them logically.

We are currently working on refactoring big parts of the submission process and hope that if the issue is caused by code on our side these refactorings will fix the underlying issue.

This sounds great :)

But if not, or if you can confirm this also happens with commit() calls we can investigate the shared prefs again.

Let's see, just be sure that the usage of the globalEncryptedSharedPrefs-instance allows for submitting commits() by default. Fingers crossed for the refactoring :crossed_fingers: , and highest respect to the work of you all :heart:

Slightly different subject: In a similar way I am wondering, how this works with the sqlite-db: Usually changes to the db (like updates to stored values) are "cached" or journalled, until the sqlite transaction receives a commit. Just for my own understanding, is a commit being sent, when changes/updates of CWA's db are 'applied', e. g. here: https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/storage/tracing/TracingIntervalRepository.kt#L54-L61 Or how is generally ensured for CWA that any sqlite transaction completes fully (updates are comitted) to avoid any race condition?

Thank you, and cheers, V.

kolyaopahle commented 4 years ago

Hi @vaubaehn

If there is any declaration in the code to use edit.commit() in general for interactions with EncryptedSharedPreferences, I'd be happy, if you could point me to there.

we use the LocalData Object for most of our in-app persistence, https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/storage/LocalData.kt#L1-L733

in it we use the androidx.core.content.edit extension function to SharedPreferences and pass true to enable commit edit mode.

I'm not sure if this is quite correct, that you use edit.commit() on all interactions with EncryptedSharedPrefs. One (time) critical example is how you store the sqlite-db-password:

This actually does look quite interesting in relation to the known bug with sql database corruption, ill bring this up in our team meeting tomorrow and will update here if anyone has checked this already.

In a similar way I am wondering, how this works with the sqlite-db: Usually changes to the db (like updates to stored values) are "cached" or journalled, until the sqlite transaction receives a commit. Just for my own understanding, is a commit being sent, when changes/updates of CWA's db are 'applied', e. g. here:

I'm actually not quite sure about this one as i have not touched the sql part of the data persistency layer, from a quick glance on my part id assume that this is handled by the androidx room library. Running a quick google search i could not actually find any information on the persistence behaviour of room but i will also update after asking around in the team tomorrow.

vaubaehn commented 4 years ago

Hi @kolyaopahle ,

in it we use the androidx.core.content.edit extension function to SharedPreferences and pass true to enable commit edit mode.

thanks a lot, I got it now! In this case, should https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/storage/LocalData.kt#L717-L733 line 729 be getSharedPreferenceInstance().edit(true) {...} for consistency?

Anyway, if everything is persisted by commit(), then the likelihood for a race condition should be minimized here. But still it's not impossible and maybe most plausible, if the device was slow due to background work (cleaning up and releasing ressources after scanning QR?) and/or because of a slow androidx.security.crypto/tink?

This actually does look quite interesting in relation to the known bug with sql database corruption, ill bring this up in our team meeting tomorrow and will update here if anyone has checked this already.

Looking forward for your update here.

I'm actually not quite sure about this one as i have not touched the sql part of the data persistency layer, from a quick glance on my part id assume that this is handled by the androidx room library. Running a quick google search i could not actually find any information on the persistence behaviour of room but i will also update after asking around in the team tomorrow.

Also looking forward for the update here. I wonder, if the sqlite-db is more at risk to be corrupted in general, for me it looks like #642 (comment) and #579 (comment) point into that direction.

kolyaopahle commented 4 years ago

Hey @vaubaehn,

line 729 be getSharedPreferenceInstance().edit(true) {...} for consistency?

absolutely, will change this, or if you want you can open a pr for it 👍

Anyway, if everything is persisted by commit(), then the likelihood for a race condition should be minimized here. But still it's not impossible and maybe most plausible, if the device was slow due to background work (cleaning up and releasing ressources after scanning QR?) and/or because of a slow androidx.security.crypto/tink?

From my investigation into the code there is no way to have seen the test result screen without https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/storage/LocalData.kt#L559-L569 having been called. But as the code responsible for this logic is also targeted in the refactoring we can hopefully fix any race conditions appearing there.

Looking forward for your update here. Also looking forward for the update here. I wonder, if the sqlite-db is more at risk to be corrupted in general, for me it looks like #642 (comment) and #579 (comment) point into that direction.

we did some talking and identified a few more spots that might cause a race condition on App initialisation that could harm db password generation and persistence. We will continue to investigate those and i will update here and in the related tickets when anything of note comes up.

vaubaehn commented 4 years ago

Hi @kolyaopahle ,

line 729 be getSharedPreferenceInstance().edit(true) {...} for consistency?

absolutely, will change this, or if you want you can open a pr for it +1

Sure I want to open a PR for it :) Can I base on release/1.5.x?

Anyway, if everything is persisted by commit(), then the likelihood for a race condition should be minimized here. [...]

From my investigation into the code there is no way to have seen the test result screen without [fun registrationToken()] having been called. But as the code responsible for this logic is also targeted in the refactoring we can hopefully fix any race conditions appearing there.

Then it's quite mysterious what happened here... :ghost: Let's see then.

Looking forward for your update here.[...] we did some talking and identified a few more spots that might cause a race condition on App initialisation that could harm db password generation and persistence. We will continue to investigate those and i will update here and in the related tickets when anything of note comes up.

Sounds good. We'll all be happy to see increased stability in the future :) And getting us updated would be perfect! Thanks a lot!

svengabr commented 4 years ago

Thank you, everyone, for your contribution here.

Looking at the comments, I would assume that the issue is resolved by the pull requests targetted to 1.6. However, I am not sure if we need a separate Jira Issue or if there are still open points in this Issue.

@d4rken what do you suggest?

d4rken commented 4 years ago

@svengabr I don't think #1452 fixes this issue, it was only a tangentially related bug that was discovered while researching this.

Based on the discussion between @kolyaopahle and @vaubaehn we also spawned EXPOSUREAPP-3385 to investigate a possible racecondition on SharedPreferences and Database initialization, but this also does not scream "hey i'm the bug cause" to me, so I would keep this ticket open until we have a plausible explanation for what's happening here.

mmsh85 commented 4 years ago

If it would help to test again as a test user with a couple of trials, I am willing to help as well. I would need a way to generate (or be provided) test QR codes though

daimpi commented 4 years ago

@mmsh85

I would need a way to generate (or be provided) test QR codes though

Here you go:

Test 1 frame (2)

Test 2 frame (3)

If you need more, just generate QR codes with the following content: https://localhost/?133785-12245668-1234-4DA7-B166-B86D8737xxxx where you can replace the xxxx by any 4 digit number you like… (you can actually change more than that but I haven't really tested everything and this should anyway give you enough codes you can use for testing).

dsarkar commented 3 years ago

Internal Tracking ID: EXPOSUREAPP-2126

MikeMcC399 commented 3 years ago

Version 1.12.0, which has now been released, includes the fix "Small LocalData cleanup (DEV, EXPOSUREAPP-2126)" #2183 which has been linked to this issue.

It seems it was not easy to reproduce the issue, so it is probably also not easy to test if the fix has resolved the original issue.

d4rken commented 3 years ago

Version 1.12.0, which has now been released, includes the fix "Small LocalData cleanup (DEV, EXPOSUREAPP-2126)" #2183 which has been linked to this issue.

Unfortunately, I don't expect the change to have fixed this. It was just something I noticed while searching for a cause.

MikeMcC399 commented 3 years ago

@d4rken

Unfortunately, I don't expect the change to have fixed this. It was just something I noticed while searching for a cause.

Thanks for clarifying! I was not assuming that the root cause had been fixed.

It seems only @mmsh85 has reported this problem here and that was four months ago. Have there been any other reports since then via other channels?

mmsh85 commented 3 years ago

It seems only @mmsh85 has reported this problem here and that was four months ago. Have there been any other reports since then via other channels?

Issue was reported in version 1.3.1 actually. I tried to reproduce the issue starting 1.5 with the dummy QR codes provided, but wasn't successful since then. Hope this helps!

mmsh85 commented 3 years ago

I will close the topic unless commented that it should still be kept (for other tracking reasons).

dsarkar commented 3 years ago

@mmsh85 Many thanks for contributing here. Should any further problem occur please re-open or create a new issue, as appropriate. Thanks, best wishes, DS


Corona-Warn-App Open Source Team