e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

usercache local storage was deleted and does not stick #750

Open shankari opened 2 years ago

shankari commented 2 years ago

Reported by a user:

Use 1 Use 2 Use 3
image image image

Note that, unlike https://github.com/e-mission/e-mission-phone/commit/a09385b03cbcc994c0bd3c33d7317a242132dc65 (part of https://github.com/e-mission/e-mission-phone/pull/470), this time the local storage was not empty and the usercache was empty.

shankari commented 2 years ago

Looking at the user's logs, this has been going on since June 6th. It might have even occured earlier, but previous logs were truncated.

19,1654546646.61791,2022-06-06T13:17:26.617910-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

Every single time, we say that we will copy the local to the native, but the copy never seems to "stick"

18,1654546646.61722,2022-06-06T13:17:26.617220-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
19,1654546646.61791,2022-06-06T13:17:26.617910-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

142,1654549916.2284,2022-06-06T14:11:56.228400-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
143,1654549916.2290401,2022-06-06T14:11:56.229040-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

270,1654556446.13521,2022-06-06T16:00:46.135210-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
271,1654556446.13664,2022-06-06T16:00:46.136640-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

1166,1654557878.03728,2022-06-06T16:24:38.037280-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
1167,1654557878.03806,2022-06-06T16:24:38.038060-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

1715,1654559319.7890599,2022-06-06T16:48:39.789060-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
1716,1654559319.7897198,2022-06-06T16:48:39.789720-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."

...

287533,1657134282.70151,2022-07-06T12:04:42.701510-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""}"
287534,1657134282.70227,2022-07-06T12:04:42.702270-07:00,"uc_stored_val = null ls_stored_val = {""intro_done"":""2021-10-12T07:32:35-06:00""} copying local intro_done to native..."
shankari commented 2 years ago

My first thought was that this may be an issue with accessing the database from javascript. But no, the native code in [ConfigManager isConsented]

    ConsentConfig* currConfig = (ConsentConfig*)[[BuiltinUserCache database] getDocument:CONSENT_CONFIG_KEY wrapperClass:[ConsentConfig class]];
        [LocalNotificationManager addNotification:[NSString stringWithFormat:@"reqConsent = %@, currConfig.approval_date = %@",
            reqConsent, currConfig.approval_date]];

    if ([reqConsent isEqualToString:currConfig.approval_date]) {
            [LocalNotificationManager addNotification:@"isConsented = YES"];
        return YES;
    } else {
            [LocalNotificationManager addNotification:@"isConsented = NO"];
        return NO;
    }

also fails

4681,1654643846.38509,2022-06-07T16:17:26.385090-07:00,"reqConsent = 2016-07-14, currConfig.approval_date = (null)"
4682,1654643846.38593,2022-06-07T16:17:26.385930-07:00,isConsented = NO
shankari commented 2 years ago

Further, the last trip recorded by the user is from 2022-06-03. So it looks like this issue with the database started back then. Unfortunately, the logs start from June 6, so we don't have logs of the time that the database queries broke.

0,1654546645.9126,2022-06-06T13:17:25.912600-07:00,BEMJWTAuth:pluginInitialize singleton -> initialize completion handler
shankari commented 2 years ago

Have asked the user to uninstall + reinstall, but to send me the userCacheDB before doing so. Let's see if the DB is actually empty, or is merely corrupted.

shankari commented 2 years ago

Note also that the loggerDB was apparently not affected, so it is not true that iOS cleared out ALL the databases. It appears to ONLY be true for the usercacheDB. Not sure what (if anything) is different between the two databases. Would be great to see if the usercacheDB is corrupted, and if that is the main reason.

shankari commented 2 years ago

the user sent the userCacheDB, and it is 258M so definitely not empty.

$ sqlite3 -csv /tmp/userCacheDB
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> .tables
userCache       userCacheError
sqlite> select count(*) from userCache;
Error: database disk image is malformed
sqlite> select * from userCache;
w1,,America/Denver,local-storage,CURR_GEOFENCE_LOCATION,,"{""type"":""Point"",""coordinates"":[...]}"
w2,,America/Denver,local-storage,CURR_GEOFENCE_LOCATION,,"{""type"":""Point"",""coordinates"":[...]}"
w3,,America/Denver,local-storage,CURR_GEOFENCE_LOCATION,,"{""type"":""Point"",""coordinates"":[...]}"
w4,,America/Denver,local-storage,CURR_GEOFENCE_LOCATION,,"{""type"":""Point"",""coordinates"":[...]}"
w5,,America/Denver,local-storage,CURR_GEOFENCE_LOCATION,,"{""type"":""Point"",""coordinates"":[...]}"
w6,,document,diary/trips-2022-05-27,,"[{""properties"":{""end_loc"":{""type"":""Point"",...
Error: database disk image is malformed

So the database was in fact corrupted.

shankari commented 2 years ago

Reasons why sqlite could be corrupted: https://www.sqlite.org/howtocorrupt.html

I think we can rule out 4. Disk Drive and Flash Memory Failures since the related logger was not affected. We can also rule 1.4. Mispairing database files and hot journals We can also probably rule out 2. File locking problems, except 2.1

But a lot of the others relate to various file system operations that we have no control over and are opaque on iOS For example, 1.2. Backup or restore while a transaction is active could interact with iOS backup/restore processes. We do make sure that the DB file is marked as no backup

            NSURL *nosyncURL = [ NSURL fileURLWithPath: nosync];
            if (![nosyncURL setResourceValue: [NSNumber numberWithBool: YES] 
                                      forKey: NSURLIsExcludedFromBackupKey error: &err]) {  
                NSLog(@"IGNORED: error setting nobackup flag in LocalDatabase directory: %@", err);
            }

but not sure if the IGNORED error was triggered for this user, and whether iOS performs any local backups/file copies.

Similarly, 1.1. Continuing to use a file descriptor after it has been closed can interact with iOS background processes in unexpected ways.

the SQLite page also links to a similar error that facebook found and fixed https://engineering.fb.com/2014/08/12/ios/debugging-file-corruption-on-ios/

I hope it doesn't come to that level of debugging - we don't have a ton of people capable of adding hooks into native code libraries, and we have a lot of priorities to handle - but this is not going to be an easy fix for sure 😦

shankari commented 2 years ago

It also looks like this is not unheard-of https://stackoverflow.com/questions/9411689/offline-cache-on-ios-application-sqlite3-becomes-corrupted and the way to check for it is to run

sqlite> PRAGMA integrity_check
   ...> ;
"*** in database main ***
Main freelist: size is 1 but should be 0
On tree page 15419 cell 1: 2nd reference to page 19657
On tree page 887 cell 53: 2nd reference to page 4355
On tree page 16948 cell 2: 2nd reference to page 17066
On tree page 9741 cell 0: 2nd reference to page 15805
Page 15700 is never used
...
Page 17338 is never used"
Error: database disk image is malformed

Some concrete fixes (https://sqlite.org/forum/forumpost/5a46109077):

shankari commented 2 years ago

Found a similar issue in the more standard SQLite plugin, fixed using the EXTRA DURABLE build setting. I used to use that plugin but ran into multiple issues with incorrect reads, so ended up rolling my own. Maybe it is time to go back to it?

shankari commented 2 years ago

Worried about making a major change to storage at this point; tempted to just port over all their changes to my plugin instead. These appear to be:

Note also that we include the sqlite3 library directly

    <framework src="libsqlite3.0.tbd"/>

while they include the sqlite source directly, which allows them to control how it is compiled

        <header-file src="node_modules/cordova-sqlite-storage-dependencies/sqlite3.h" />
        <source-file src="node_modules/cordova-sqlite-storage-dependencies/sqlite3.c"
                     compiler-flags="-w -DSQLITE_THREADSAFE=1 -DSQLITE_DEFAULT_SYNCHRONOUS=3 -DSQLITE_DEFAULT_MEMSTATUS=0 -DSQLITE_OMIT_DECLTYPE -DSQLITE_OMIT_DEPRECATED -DSQLITE_OMIT_PROGRESS_CALLBACK -DSQLITE_OMIT_SHARED_CACHE -DSQLITE_TEMP_STORE=2 -DSQLITE_OMIT_LOAD_EXTENSION -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_RTREE -DSQLITE_DEFAULT_PAGE_SIZE=4096" />

which seems like a fairly major change to make outside of the migration cycle.

shankari commented 2 years ago

wrt

run PRAGMA integrity_check and prompt the user to reset the database if it is corrupted (we will copy over from local storage, so this can be transparent).

None of the standard deletion commands work. We would have to copy over the blank database.

sqlite> DELETE FROM userCache;
Error: database disk image is malformed
sqlite> DROP TABLE userCache;
Error: database disk image is malformed

The more I think about this, the more I think it is likely that the error is due to a power loss before the data was saved, which can be fixed by PRAGMA schema.synchronous = EXTRA

That is also the final conclusion in the SQLite store issue (number 34)

At this point I hope the "EXTRA DURABLE" build setting I made as discussed in https://github.com/storesafe/cordova-sqlite-storage/issues/736 proves to be sufficiently robust against the common "consumer-grade" hardware issues discussed on the mailing list. I guess only time will tell here.

which also indicates that the multithreading fix is not a real fix.

INCORRECTLY resolved by compile-time DSQLITE_THREADSAFE=2 option (see https://github.com/storesafe/cordova-sqlite-storage/issues/754)