Crashes & out of memory

crspybits commented 3 years ago

I'm trying to debug crashes and out memory indications.

1) The crash reports from Apple are around the time of database access. E.g., see https://github.com/stephencelis/SQLite.swift/issues/1042

2) Crash reports from BugSnag are indicating out of memory. (WIth no traces).

crspybits commented 3 years ago

Dany gave me logs. "No errors or pop ups,but after returning from a chrome rendered link, Neebla looked like it restarted,landed me back to album list."

He was wondering about memory usage in the app. I'm wondering if this is when BugSnag is reporting out of memory crashes.

crspybits commented 3 years ago

Running connected to the Xcode debugger, I tapped on a URL. Coming back, the app restarted. The debugger said "Message from debugger: Terminated due to signal 9".

https://developer.apple.com/forums/thread/124389

https://developer.apple.com/forums/thread/655225 This reference specifically talks about holding file locks when the app goes into the background and has some code to try to help debug this.

https://stackoverflow.com/questions/42331852/quitting-app-causes-error-message-from-debugger-terminated-due-to-signal-9

crspybits commented 3 years ago

Results of showing open files when going into the background-- and get signal 9 at times in this case too:

openFilePaths: ["/dev/null", "/dev/ttys000", "/dev/ttys000", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Application Support/com.bugsnag.Bugsnag/biz.SpasticMuffin.SharedImages/v1/kvstore", "?", "?", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/Neebla.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/SyncServer.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/logging/LogFile.txt", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-shm", "/private/var/containers/Shared/SystemGroup/systemgroup.com.apple.nsurlstoragedresources/Library/dafsaData.bin", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal"]

crspybits commented 3 years ago

This was one just before going into the background and getting the signal 9:

openFilePaths: ["/dev/null", "/dev/ttys000", "/dev/ttys000", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Application Support/com.bugsnag.Bugsnag/biz.SpasticMuffin.SharedImages/v1/kvstore", "?", "?", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/Neebla.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/SyncServer.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/logging/LogFile.txt", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-shm", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/containers/Shared/SystemGroup/systemgroup.com.apple.nsurlstoragedresources/Library/dafsaData.bin"]

crspybits commented 3 years ago

Here's what I'm using to get this:

    // Modified from https://developer.apple.com/forums/thread/655225
    func openFilePaths() -> [String] {
        let result = (0..<getdtablesize()).map { fd -> (String?) in
            var flags: CInt = 0
            guard fcntl(fd, F_GETFL, &flags) >= 0 else {
                return nil
            }
            // Return "?" for file descriptors not associated with a path, for
            // example, a socket.
            var path = [CChar](repeating: 0, count: Int(MAXPATHLEN))
            guard fcntl(fd, F_GETPATH, &path) >= 0 else {
                return "?"
            }

            // https://stackoverflow.com/questions/11737819
            var flockValue:flock = flock()
            if fcntl(fd, F_GETLK, &flockValue) >= 0 &&
                flockValue.l_type == F_WRLCK {
                return String(cString: path) + "; flock.l_type: F_WRLCK (exclusive or write lock)"
            }

            return String(cString: path)
        }

        return result.compactMap {$0}
    }

So it seems this specific issue-- of tapping on a link and coming back to Neebla is not a file lock issue.

This is also evidence that SQLite is not holding a file lock when the app goes into the background.

crspybits commented 3 years ago

A possibility for the overt crashes coming from Apple is that the SQLite db is in the shared container.

https://inessential.com/2020/02/13/how_we_fixed_the_dreaded_0xdead10cc_cras “The moral of the story ended up being: just don’t keep your SQLite database in the shared group container.”

These references don't raise any issue for using SQLIte in a shared container: https://stackoverflow.com/questions/30156107/share-sqlite-database-between-app-and-extension https://rbranson.medium.com/sharing-sqlite-databases-across-containers-is-surprisingly-brilliant-bacb8d753054 https://www.avanderlee.com/swift/core-data-app-extension-data-sharing/ (focuses on Core Data)

https://github.com/andpor/react-native-sqlite-storage/issues/308 "This is necessary due to an obscure iOS behavior where processes will be terminated if they are backgrounded while holding a lock on a file in a shared container, unless the file is a SQLite database in WAL mode (presumably Apple is using such databases themselves, hence the exception, which appears to be tied to reading the header of the SQLite database to check for WAL mode). Without WAL mode, you'll see crashes in the wild if the app is backgrounded while the database is being modified ..."

File lock issue: https://developer.apple.com/forums/thread/655225 https://github.com/sqlcipher/sqlcipher/issues/255 https://developer.apple.com/forums/thread/126438

https://developer.apple.com/library/archive/technotes/tn2408/_index.html “When you create a shared container for use by an app extension and its containing app in iOS 8.0 or later, you are obliged to write to that container in a coordinated manner to avoid data corruption.”

"Solutions You can use CFPreferences, atomic safe save operations on flat files, or SQLite or Core Data to share data in a group container between multiple processes even if one is suspended mid transaction. For SQLite/Core Data, processes using databases in DELETE journal mode will be instantly killed instead of suspended. Processes using database in WAL journal mode will only be jetsam'd if they attempt to hold a write transaction open at the time of suspension. Posix file locks also work, and behave similarly to SQLite database in DELETE journal mode, so open(O_EXLOCK) or flock() could also be used. Regardless of this issue, the containing app (and all applications) should properly use background task assertions around file operations they require completed in a shared container (with or without extensions). This includes all writes or deletions. Such a process might still be killed by jetsam but at a much lower frequency."

Issues with file protection flags https://github.com/sqlcipher/sqlcipher/issues/156 https://stackoverflow.com/questions/41599261/crash-when-opening-fmdb-in-background-mode https://github.com/ccgus/fmdb/issues/262

General docs https://sqlite.org/docs.html https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask https://sqlite.org/wal.html https://sqlite.org/tempfiles.html https://www.sqlite.org/walformat.html

Given that I'm seeing wal and shm files it looks like my SQLite db is open in WAL mode.

crspybits commented 3 years ago

https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask

You can call this method at any point in your app’s execution. You may also call this method multiple times to mark the beginning of several background tasks that run in parallel. However, each task must be ended separately. You identify a given task using the value returned by this method. To assist with debugging, this method generates a name for the task, based on the name of the calling method or function. If you want to specify a custom name, use the beginBackgroundTask(withName:expirationHandler:) method instead. This method can be safely called on a non-main thread. To extend the execution time of an app extension, use the performExpiringActivity(withReason:using:) method of ProcessInfo instead.

crspybits commented 3 years ago

My next planned action is to do this:

"the containing app (and all applications) should properly use background task assertions around file operations they require completed in a shared container (with or without extensions). This includes all writes or deletions. Such a process might still be killed by jetsam but at a much lower frequency." (from https://developer.apple.com/library/archive/technotes/tn2408/_index.html)

This should be relatively straight forward for iOSBasics as there are pretty well-defined entry points to this API.

It will be tricker in the Neebla app itself, which uses a separate SQLite db.

crspybits commented 3 years ago

We'll, I've wrapped the main entry points for iOSBasics in a new BackgroundAsssertable from iOSShared. And that doesn't appear to be changing the crash rate for Neebla. At least now I've got a "good" repro path: If you remove the app, get an album downloading, put app in the background and turn of the screen. That gets me lots 'o crashes.

crspybits commented 3 years ago

https://developer.apple.com/documentation/xcode/diagnosing_issues_using_crash_reports_and_device_logs/identifying_high-memory_use_with_jetsam_event_reports "Discover why the operating system terminated your app when available memory was low."

https://developer.apple.com/forums/thread/73192 https://stackoverflow.com/questions/15122986/what-does-jetsam-really-do https://programmer.ink/think/learn-more-about-oom-low-memory-crash-in-ios.html

crspybits commented 3 years ago

In bug reports from Apple, where do the crashes originate? Am I on the right track with the background assertions?

20120317.0:

1.1. Screen Shot 2021-03-19 at 3 19 57 PM

1.2. Screen Shot 2021-03-19 at 3 20 53 PM

1.3. Screen Shot 2021-03-19 at 3 21 23 PM

1.4. Screen Shot 2021-03-19 at 3 22 54 PM

1.5. Screen Shot 2021-03-19 at 3 22 07 PM

1.6. Screen Shot 2021-03-19 at 3 23 23 PM

1.7. Screen Shot 2021-03-19 at 3 24 00 PM

20120319.0:

2.1. Screen Shot 2021-03-19 at 3 26 25 PM

2.2. Screen Shot 2021-03-19 at 3 27 09 PM

2.3. Screen Shot 2021-03-19 at 3 27 59 PM

2.4. Screen Shot 2021-03-19 at 3 28 30 PM

crspybits commented 3 years ago

1) Replacing Bugsnag with Firebase Crashlytics: Bugsnag doesn't catch lots of crashes. And Apple crash reporting only seems to work for TestFlight builds/builds you upload to them. And has some big latencies. https://github.com/firebase/firebase-ios-sdk/blob/master/SwiftPackageManager.md

2) Wrap Timer usage in the new background assertions.

crspybits commented 3 years ago

In TestFlight build 20210319.1 I've added some more usage of backgroundAsssertable in iOSBasics-- this time in the network delegate calls. The crashes above (https://github.com/SyncServerII/Neebla/issues/7#issuecomment-802978539) I was getting stem from network delegate callbacks, but I'm not sure this change will quite enough. I'd like to proceed gradually, adding more code to deal with crashes only as needed. When I'm certain it's needed.

crspybits commented 3 years ago

Result: Still crashes. In same manner. Get an album started downloading, put into the background, and fairly reliably get a crash.

crspybits commented 3 years ago

May want to add more backgroundAssertable's in NotificationCenter.default.addObserver's.

Still getting: 2021-03-19T23:06:13-0600 debug : Image status: .loaded (from scaling) while the app in in the background-- so still updating UI.

crspybits commented 3 years ago

I have moved to using a single SQLite db (from one in the main app in one in iOSBasics). This is to:

a) simplify things

b) because it seemed like I wasn't seeing 2x WAL and SHM files open (which I would expect), and

c) Admittedly not very extensive looking on stackoverflow seemed to indicate that multiple SQLIte dbs was inadvisable.

The main possible problem with this is conflicting names of databases tables across Neebla and iOSBasics.

crspybits commented 3 years ago

That db change didn't help the crash issue.

My next change is to not send local notifications in Neebla when the app is in the background. These drive the code in https://github.com/SyncServerII/Neebla/issues/7#issuecomment-802978539. My hypothesis is that when the network delegate methods get called in the backgound, the amount of work that can be done by the app is limited. And if you go over that, the system kills the app.

crspybits commented 3 years ago

Testing criteria: Getting all albums downloaded without a single crash reported from Apple-- using a TestFlight build. (Yes, this is a moving criteria, as it depends on the number of albums and their specific size and data parameters). I'm doing this by starting the download of an album, putting the app into the background, and (usually) turning off the screen. Wash, rinse, repeat.

1) crash: 11:11am 2) crash: 11:13am 3) crash: 11:13am 4) crash: 11:14am

Not sure if it's notable or not but the bulk (or all?) of these are coming while I'm downloading the computer album.

Now starting to download the cats album.

5) Crash 11:17am. 6) Crash 11:18am 7) Crash 11:19am. 8) Crash 11:20am 9) Crash 11:21am

crspybits commented 3 years ago

All of these crashes I see so far in Apple crash reporting are still coming from unreadCountFor-- haven't dealt with all of these yet.

crspybits commented 3 years ago

TestFlight 20210320.1 testing: Got through a full round of downloads of all albums from scratch with no crashes!

SyncServerII / Neebla

Crashes & out of memory #7