Open crspybits opened 3 years ago
Dany gave me logs. "No errors or pop ups,but after returning from a chrome rendered link, Neebla looked like it restarted,landed me back to album list."
He was wondering about memory usage in the app. I'm wondering if this is when BugSnag is reporting out of memory crashes.
Running connected to the Xcode debugger, I tapped on a URL. Coming back, the app restarted. The debugger said "Message from debugger: Terminated due to signal 9".
https://developer.apple.com/forums/thread/124389
https://developer.apple.com/forums/thread/655225 This reference specifically talks about holding file locks when the app goes into the background and has some code to try to help debug this.
Results of showing open files when going into the background-- and get signal 9 at times in this case too:
openFilePaths: ["/dev/null", "/dev/ttys000", "/dev/ttys000", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Application Support/com.bugsnag.Bugsnag/biz.SpasticMuffin.SharedImages/v1/kvstore", "?", "?", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/Neebla.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/SyncServer.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/logging/LogFile.txt", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-shm", "/private/var/containers/Shared/SystemGroup/systemgroup.com.apple.nsurlstoragedresources/Library/dafsaData.bin", "/private/var/mobile/Containers/Data/Application/D3778A59-9DAB-4B53-9D3E-8100F40E7DAC/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal"]
This was one just before going into the background and getting the signal 9:
openFilePaths: ["/dev/null", "/dev/ttys000", "/dev/ttys000", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Application Support/com.bugsnag.Bugsnag/biz.SpasticMuffin.SharedImages/v1/kvstore", "?", "?", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/Neebla.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/SyncServer.SQLite.db", "/private/var/mobile/Containers/Shared/AppGroup/2A899802-F51E-4921-BD14-71E7F3CE6582/Documents/logging/LogFile.txt", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-shm", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db", "/private/var/mobile/Containers/Data/Application/F4FA2CCC-447D-41B7-866A-4BB047D3FDBF/Library/Caches/biz.SpasticMuffin.SharedImages/Cache.db-wal", "/private/var/containers/Shared/SystemGroup/systemgroup.com.apple.nsurlstoragedresources/Library/dafsaData.bin"]
Here's what I'm using to get this:
// Modified from https://developer.apple.com/forums/thread/655225
func openFilePaths() -> [String] {
let result = (0..<getdtablesize()).map { fd -> (String?) in
var flags: CInt = 0
guard fcntl(fd, F_GETFL, &flags) >= 0 else {
return nil
}
// Return "?" for file descriptors not associated with a path, for
// example, a socket.
var path = [CChar](repeating: 0, count: Int(MAXPATHLEN))
guard fcntl(fd, F_GETPATH, &path) >= 0 else {
return "?"
}
// https://stackoverflow.com/questions/11737819
var flockValue:flock = flock()
if fcntl(fd, F_GETLK, &flockValue) >= 0 &&
flockValue.l_type == F_WRLCK {
return String(cString: path) + "; flock.l_type: F_WRLCK (exclusive or write lock)"
}
return String(cString: path)
}
return result.compactMap {$0}
}
So it seems this specific issue-- of tapping on a link and coming back to Neebla is not a file lock issue.
This is also evidence that SQLite is not holding a file lock when the app goes into the background.
A possibility for the overt crashes coming from Apple is that the SQLite db is in the shared container.
https://inessential.com/2020/02/13/how_we_fixed_the_dreaded_0xdead10cc_cras “The moral of the story ended up being: just don’t keep your SQLite database in the shared group container.”
These references don't raise any issue for using SQLIte in a shared container: https://stackoverflow.com/questions/30156107/share-sqlite-database-between-app-and-extension https://rbranson.medium.com/sharing-sqlite-databases-across-containers-is-surprisingly-brilliant-bacb8d753054 https://www.avanderlee.com/swift/core-data-app-extension-data-sharing/ (focuses on Core Data)
https://github.com/andpor/react-native-sqlite-storage/issues/308 "This is necessary due to an obscure iOS behavior where processes will be terminated if they are backgrounded while holding a lock on a file in a shared container, unless the file is a SQLite database in WAL mode (presumably Apple is using such databases themselves, hence the exception, which appears to be tied to reading the header of the SQLite database to check for WAL mode). Without WAL mode, you'll see crashes in the wild if the app is backgrounded while the database is being modified ..."
File lock issue: https://developer.apple.com/forums/thread/655225 https://github.com/sqlcipher/sqlcipher/issues/255 https://developer.apple.com/forums/thread/126438
https://developer.apple.com/library/archive/technotes/tn2408/_index.html “When you create a shared container for use by an app extension and its containing app in iOS 8.0 or later, you are obliged to write to that container in a coordinated manner to avoid data corruption.”
"Solutions You can use CFPreferences, atomic safe save operations on flat files, or SQLite or Core Data to share data in a group container between multiple processes even if one is suspended mid transaction. For SQLite/Core Data, processes using databases in DELETE journal mode will be instantly killed instead of suspended. Processes using database in WAL journal mode will only be jetsam'd if they attempt to hold a write transaction open at the time of suspension. Posix file locks also work, and behave similarly to SQLite database in DELETE journal mode, so open(O_EXLOCK) or flock() could also be used. Regardless of this issue, the containing app (and all applications) should properly use background task assertions around file operations they require completed in a shared container (with or without extensions). This includes all writes or deletions. Such a process might still be killed by jetsam but at a much lower frequency."
Issues with file protection flags https://github.com/sqlcipher/sqlcipher/issues/156 https://stackoverflow.com/questions/41599261/crash-when-opening-fmdb-in-background-mode https://github.com/ccgus/fmdb/issues/262
General docs https://sqlite.org/docs.html https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask https://sqlite.org/wal.html https://sqlite.org/tempfiles.html https://www.sqlite.org/walformat.html
Given that I'm seeing wal and shm files it looks like my SQLite db is open in WAL mode.
https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask
You can call this method at any point in your app’s execution. You may also call this method multiple times to mark the beginning of several background tasks that run in parallel. However, each task must be ended separately. You identify a given task using the value returned by this method. To assist with debugging, this method generates a name for the task, based on the name of the calling method or function. If you want to specify a custom name, use the beginBackgroundTask(withName:expirationHandler:) method instead. This method can be safely called on a non-main thread. To extend the execution time of an app extension, use the performExpiringActivity(withReason:using:) method of ProcessInfo instead.
See also: https://www.andyibanez.com/posts/background-execution-in-ios/ https://stackoverflow.com/questions/10319643/proper-use-of-beginbackgroundtaskwithexpirationhandler https://medium.com/swlh/handling-background-tasks-in-ios-13-67f717d94b3d
My next planned action is to do this:
"the containing app (and all applications) should properly use background task assertions around file operations they require completed in a shared container (with or without extensions). This includes all writes or deletions. Such a process might still be killed by jetsam but at a much lower frequency." (from https://developer.apple.com/library/archive/technotes/tn2408/_index.html)
This should be relatively straight forward for iOSBasics as there are pretty well-defined entry points to this API.
It will be tricker in the Neebla app itself, which uses a separate SQLite db.
We'll, I've wrapped the main entry points for iOSBasics in a new BackgroundAsssertable
from iOSShared. And that doesn't appear to be changing the crash rate for Neebla. At least now I've got a "good" repro path: If you remove the app, get an album downloading, put app in the background and turn of the screen. That gets me lots 'o crashes.
https://developer.apple.com/documentation/xcode/diagnosing_issues_using_crash_reports_and_device_logs/identifying_high-memory_use_with_jetsam_event_reports "Discover why the operating system terminated your app when available memory was low."
https://developer.apple.com/forums/thread/73192 https://stackoverflow.com/questions/15122986/what-does-jetsam-really-do https://programmer.ink/think/learn-more-about-oom-low-memory-crash-in-ios.html
In bug reports from Apple, where do the crashes originate? Am I on the right track with the background assertions?
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1.7.
2.1.
2.2.
2.3.
2.4.
Next:
1) Replacing Bugsnag with Firebase Crashlytics: Bugsnag doesn't catch lots of crashes. And Apple crash reporting only seems to work for TestFlight builds/builds you upload to them. And has some big latencies. https://github.com/firebase/firebase-ios-sdk/blob/master/SwiftPackageManager.md
2) Wrap Timer usage in the new background assertions.
In TestFlight build 20210319.1 I've added some more usage of backgroundAsssertable
in iOSBasics-- this time in the network delegate calls. The crashes above (https://github.com/SyncServerII/Neebla/issues/7#issuecomment-802978539) I was getting stem from network delegate callbacks, but I'm not sure this change will quite enough. I'd like to proceed gradually, adding more code to deal with crashes only as needed. When I'm certain it's needed.
Result: Still crashes. In same manner. Get an album started downloading, put into the background, and fairly reliably get a crash.
May want to add more backgroundAssertable's in NotificationCenter.default.addObserver's.
Still getting: 2021-03-19T23:06:13-0600 debug : Image status: .loaded (from scaling) while the app in in the background-- so still updating UI.
I have moved to using a single SQLite db (from one in the main app in one in iOSBasics). This is to:
a) simplify things
b) because it seemed like I wasn't seeing 2x WAL and SHM files open (which I would expect), and
c) Admittedly not very extensive looking on stackoverflow seemed to indicate that multiple SQLIte dbs was inadvisable.
The main possible problem with this is conflicting names of databases tables across Neebla and iOSBasics.
That db change didn't help the crash issue.
My next change is to not send local notifications in Neebla when the app is in the background. These drive the code in https://github.com/SyncServerII/Neebla/issues/7#issuecomment-802978539. My hypothesis is that when the network delegate methods get called in the backgound, the amount of work that can be done by the app is limited. And if you go over that, the system kills the app.
Testing criteria: Getting all albums downloaded without a single crash reported from Apple-- using a TestFlight build. (Yes, this is a moving criteria, as it depends on the number of albums and their specific size and data parameters). I'm doing this by starting the download of an album, putting the app into the background, and (usually) turning off the screen. Wash, rinse, repeat.
1) crash: 11:11am 2) crash: 11:13am 3) crash: 11:13am 4) crash: 11:14am
Not sure if it's notable or not but the bulk (or all?) of these are coming while I'm downloading the computer album.
Now starting to download the cats album.
5) Crash 11:17am. 6) Crash 11:18am 7) Crash 11:19am. 8) Crash 11:20am 9) Crash 11:21am
All of these crashes I see so far in Apple crash reporting are still coming from unreadCountFor
-- haven't dealt with all of these yet.
TestFlight 20210320.1 testing: Got through a full round of downloads of all albums from scratch with no crashes!
I'm trying to debug crashes and out memory indications.
1) The crash reports from Apple are around the time of database access. E.g., see https://github.com/stephencelis/SQLite.swift/issues/1042
2) Crash reports from BugSnag are indicating out of memory. (WIth no traces).