Firestore crash / assertion failed / failed to commit transaction / collect garbage

mikehardy commented 5 years ago

[READ] Step 1: Are you in the right place?

For issues or feature requests related to the code in this repository file a Github issue.
- If this is a feature request make sure the issue title starts with "FR:".
For general technical questions, post a question on StackOverflow with the firebase tag.
For general Firebase discussion, use the firebase-talk google group.
For help troubleshooting your application that does not fall under one of the above categories, reach out to the personalized Firebase support channel.

[REQUIRED] Step 2: Describe your environment

Xcode version: 12.1
Firebase SDK version: 5.20.1
Firebase Component: _Firestore__ (Auth, Core, Database, Firestore, Messaging, Storage, etc)
Component version: _1.2.1__

[REQUIRED] Step 3: Describe the problem

App crashes (as detected by Crashlytics). Apparently while in garbage collection. I have a stack trace but it doesn't make much sense as it appears to be collection 0 garbage, and fails as operation not permitted on a file. The error and stack trace are likely the most useful items:

error:

Fatal Exception: NSInternalInconsistencyException
FIRESTORE INTERNAL ASSERTION FAILED: Failed to commit transaction: <LevelDbTransaction Collect garbage: 0 changes (0 bytes):> Failed: IO error: /var/mobile/Containers/Data/Application/919A1947-23CF-45C9-A46F-EAEAA0787A06/Documents/firestore/__FIRAPP_DEFAULT/kscoreapp/main/000015.log: Operation not permitted (expected status.ok())

stack:

Fatal Exception: NSInternalInconsistencyException
0  CoreFoundation                 0x191990518 __exceptionPreprocess
1  libobjc.A.dylib                0x190b6b9f8 objc_exception_throw
2  CoreFoundation                 0x1918aa148 +[_CFXNotificationTokenRegistration keyCallbacks]
3  Foundation                     0x1923b9f5c -[NSAssertionHandler handleFailureInFunction:file:lineNumber:description:]
4  KScoreApp                      0x100bfcfcc firebase::firestore::util::internal::Fail(char const*, char const*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (hard_assert_apple.mm:34)
5  KScoreApp                      0x100bfd0e4 firebase::firestore::util::internal::Fail(char const*, char const*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*) (hard_assert_apple.mm:54)
6  KScoreApp                      0x100c0a1b0 firebase::firestore::local::LevelDbTransaction::Commit() (leveldb_transaction.cc:211)
7  KScoreApp                      0x100bb568c -[FSTLevelDB commitTransaction] (FSTLevelDB.mm:505)
8  KScoreApp                      0x100bbeffc -[FSTLocalStore collectGarbage:] (FSTPersistence.h:216)
9  KScoreApp                      0x100bac900 std::__1::__function::__func<-[FSTFirestoreClient scheduleLruGarbageCollection]::$_3, std::__1::allocator<-[FSTFirestoreClient scheduleLruGarbageCollection]::$_3>, void ()>::operator()() (FSTFirestoreClient.mm:267)
10 KScoreApp                      0x100b71dbc firebase::firestore::util::AsyncQueue::ExecuteBlocking(std::__1::function<void ()> const&) (atomic:921)
11 KScoreApp                      0x100b84ebc firebase::firestore::util::TimeSlot::InvokedByLibdispatch(void*) (executor_libdispatch.mm:179)
12 libdispatch.dylib              0x1913d17d4 _dispatch_client_callout
13 libdispatch.dylib              0x191376018 _dispatch_continuation_pop$VARIANT$mp
14 libdispatch.dylib              0x191385fa4 _dispatch_source_invoke$VARIANT$mp
15 libdispatch.dylib              0x19137a1ec _dispatch_lane_serial_drain$VARIANT$mp
16 libdispatch.dylib              0x19137ae3c _dispatch_lane_invoke$VARIANT$mp
17 libdispatch.dylib              0x1913834a8 _dispatch_workloop_worker_thread
18 libsystem_pthread.dylib        0x1915b1114 _pthread_wqthread
19 libsystem_pthread.dylib        0x1915b3cd4 start_wqthread

Steps to reproduce:

I would love to have steps to reproduce but I haven't seen this locally, and don't have the logs or discussion of steps from relevant users / devices

Any instinct on how this could happen that could help me pinpoint it would be welcome

Relevant Code:

Unsure as this appears to be an internal routine. The relevant file appears to have last been edited by Grand Moff Harkin long ago and I'm not familiar enough with Firestore internals to make any guess as to possible modes of failure in a GC.

I may have missed them but I was similarly unable to find any open or closed issues in this repo that seemed relevant though I did try.

google-oss-bot commented 5 years ago

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

mikehardy commented 5 years ago

Possibly worth noting in case it is relevant -

this is a react-native app, using react-native-firebase as a wrapper to access this functionality
the app itself is mostly write-only by data volume and for all but one document (a user object). The non-user objects are for the most part sent out in a batch but without any specific transaction API calls in the app.
I'm aware there are size limits for documents and I believe I am under them but perhaps I am sending a chunk of data too large or something.

wilhuff commented 5 years ago

Some background: The Firestore client uses LevelDB under the covers to implement local storage. The Firestore client implements a least-recently-used garbage collection scheme for documents in local storage to bound the amount of data kept locally. The default threshold for collection is 100 MB of data on disk.

The stack indicates that Firestore was trying to commit to LevelDB after completing a GC pass. It's very likely that the transaction had no changes though, because one of the first things the collector does is determine whether or not there's enough data to warrant collection. Most apps will never generate 100 MB of data so these commits will never be anything but empty.

My initial instinct was that LevelDB might somehow be rejecting an empty commit, but of course this can't be true or we'd have 100k me-toos on this bug and it would have been filed immediately after we added GC in Firebase 5.14.0 back in December. A quick test verified that empty commits succeed so that's definitely not it.

The error message itself indicates that LevelDB encountered an I/O error attempting to write one of its files. The specific error is Operation not permitted (EPERM) which is pretty surprising given that it usually happens either because you're trying to write to a file for which you have no permission or because you're trying to write to a file outside the app sandbox. This file is definitely inside the app sandbox and Firestore doesn't mess with file permissions so neither of those should be the cause here. If the device were out of space I would expect ENOSPC.

I'm not entirely sure about LevelDB's log numbering scheme, but I believe it's sequential. LevelDB failed to write to 000015.log which suggests that this device has been happily churning along writing for some time.

As far as reproduction goes, this scenario of writing an empty commit should be happening ~all the time already. GC is scheduled initially 1 minute after startup and then every 5 minutes thereafter. You could reduce the intervals by modifying their definition but at this point I'd suspect that the empty commit is more of a bystander than the cause of the issue. We could change the code to avoid actually committing when there are no changes though this would likely just move the issue elsewhere.

Some questions for you:

How prevalent is the issue?
Is there any pattern to the kinds of devices it's happening on (hardware or OS version)?

Unfortunately, I don't have much more to go on. AFAICT this shouldn't be happening :-(.

Regarding the points you raised:

this is a react-native app, using react-native-firebase as a wrapper to access this functionality

This shouldn't matter. GC is definitely unrelated. There aren't any customization mechanisms that could implicate the other library.

the app itself is mostly write-only by data volume and for all but one document (a user object). The non-user objects are for the most part sent out in a batch but without any specific transaction API calls in the app.

That the profile of your interaction is write-mostly is interesting, but doesn't seem like it would matter for the purpose of this issue. Here we failed to run GC when we didn't actually do anything.

I'm aware there are size limits for documents and I believe I am under them but perhaps I am sending a chunk of data too large or something.

Size limits are enforced by the server at the point where we try to commit writes there. GC does not enforce or otherwise examine size limits. If LevelDB failed to accept a write for size reasons the stack would be much different.

mikehardy commented 5 years ago

Well, glad I was able to provide something intellectually challenging. Always the goal ;-)

You asked for issue prevalence and patterns: the quick take is this is very prevalent for us, not some idle corner case I'd like to fix. I'm unable to determine a pattern as we have a little skew on device and iOS version but despite percentage incidence being high the total count is low

the extended take is that we have an MVP that we just launched Saturday. According to analytics we had

15 "first_opens" which engages firestore for a user object
10 users onboard to the point where firestore would be used for user object + write-mostly
5 users crashed with 10 total events on this issue so far.
all iOS 12 (paged through and it's 4 on 12.2 and 1 on 12.1),
30% in background (which is interesting because we attempt no app-level firebase-related activities in the background).
1 iPhone 5s, 4 iPhone 7 devices
all of the devices appeared to have ample space

It is puzzling.

In the analytics settings for the project I have enabled data sharing for support, so I believe you may look for patterns yourself in case it is useful and I further do not believe putting this id here compromises anything, so it is iOS app id 1:629722894643:ios:5fd8ca310ade9035

Last thought - this is the first time I have attempted to use firestore for anything, and the first time users have used it. So this is very fresh possibly naive API usage, and I may simply be doing something so bizarre no other developer would do it this way, as a possible explanation in case it's a fundamental problem that nevertheless didn't get 500k me-toos. If there is no path to move forward I could likely extract my API usage pattern and try to produce a tiny app that does a similar thing, but I would naturally like to avoid the effort if possible.

Thanks for looking in to this!

mikehardy commented 5 years ago

Oh - one more thing I can offer - the MVP launch is with a very controlled, motivated user population, aware they are working with a "pilot" quality app. They would be okay with installing test versions etc. We could add substantially more logging and attempt to attach it in some way to the crashlytics reports, or any other form of instrumentation, if it were useful.

wilhuff commented 5 years ago

I don't actually have permission to view the analytics data directly. If you were to file a support case they might be able to, though I don't see any utility going that path at this point.

However, I think errors in the background is big clue: it appears that when a device is encrypted (the default since iOS 8) and a device is locked, the file system becomes inaccessible and any attempts to write will return ... EPERM. Do you have background app refresh enabled otherwise?

If so, the way to reproduce this will be to start your app and then ~45-50 seconds later lock your screen. For the next 20 seconds or so your app will continue to run in the background and the 1 minute initial GC pass will start and you'll crash. It's likely you'll need to do this on device.

A workaround, if your data is not sensitive would be to mark the Firestore data directory non-protected, with code like this (in Swift):

func unlockFirestoreData() throws {
  let attributes = [FileAttributeKey.protectionKey: FileProtectionType.none]

  let dirs = NSSearchPathForDirectoriesInDomains(.applicationSupportDirectory, .userDomainMask, true)
  let firestoreDir = dirs[0].appending("firestore")

  // If Firestore hasn't run yet on this client, create the directory for Firestore
  // data and mark it unprotected.
  if !FileManager.default.fileExists(atPath: firestoreDir, isDirectory: nil) {
    try FileManager.default.createDirectory(atPath: firestoreDir, withIntermediateDirectories: true, attributes: attributes)
    return
  }

  // If the directory does exist, then Firestore has run on this device previously
  // so fix up any existing filesystem entries.
  let files = FileManager.default.enumerator(atPath: firestoreDir)
  while let file = files?.nextObject() as? String {
    let fullPath = (firestoreDir as NSString).appendingPathComponent(file)
    try FileManager.default.setAttributes(attributes, ofItemAtPath: fullPath)
  }
}

This will fix an existing instance or prepare a new instance.

If your data is sensitive, then the alternative would be to shut Firestore down as you're going into the background, though this currently isn't well supported.

Actually fixing this to tolerate background mode will require that Firestore listens for the UIApplicationProtectedDataWillBecomeUnavailable notification to disable actions that touch the file system and UIApplicationProtectedDataDidBecomeAvailable to re-enable them. This isn't trivial given the way the SDK works.

For completeness, you can enable Firestore logging with one of these methods. A log of activity prior to the crash could be helpful, though it's likely voluminous. We don't integrate with crashlytics out of the box for a variety of uninteresting reasons, but it should be very straightforward to plug us in.

https://docs.fabric.io/apple/crashlytics/enhanced-reports.html has details on what to do from that side. You'd change Firestore here: https://github.com/firebase/firebase-ios-sdk/blob/master/Firestore/core/src/firebase/firestore/util/log_apple.mm#L75, but substituting LogMessageV with a call to CLS_LOG.

mikehardy commented 5 years ago

Interesting - I suppose I would expect more than 30% of the crashes to be in background if that was it, but I like the hunch and can certainly do some testing there.

A quirk worth mentioning now (though maybe just confirmation bias) is that we did not get any actual user reports of this, which does smell like a quiet death in the background.

App "Capabilities" (from XCode) are Location Updates (which we use), Background Fetch, and Remote Notifications, so the app should be alive periodically in the background but we're not using firebase APIs at the time so I hadn't thought background would matter but of course it could if GC just runs on a timer.

Our information is sensitive so turning off encryption is a dead-end.

I will see what I can do about instrumentation/logging as well as specific tests to trigger it so we can have some certain we've identified the thing

Seems like I shouldn't uniquely trigger this if it's a filesystem lockout while in background. No other person handles silent push notifications from Firebase and then performs Firestore work that could trigger this?

Speculatively thinking through available options assuming this is it -

you say shutting firestore down and bringing it up again as foreground state changes is not well supported. How could it fail and how would I mitigate, if this was it and I tried this?
if the assertion not many people generate 100MB of data (requiring GC) is true, and we could successfully assert even fewer generate that much data in the background, perhaps simply turning off GC in the background is a band-aid. Or as you say, bailing on a no-op GC. Not real solutions if this is it, but possibly 95/5 and worth it. Just brainstorming there.

I'll let you know results of the targeted test you propose, and/or any logging as I get it. Might be a day or three delay, I'm in the middle of integrating fastlane to split firebase environments used by build stream so build/packaging is heavily impaired at the moment...

wilhuff commented 5 years ago

Seems like I shouldn't uniquely trigger this if it's a filesystem lockout while in background. No other person handles silent push notifications from Firebase and then performs Firestore work that could trigger this?

This is pretty surprising to me as well. We have had other users try to use Firestore in the background, but iOS kills sockets pretty early, so we advise disabling Firestore's network there. This is the first report of an I/O error due to filesystem protection (if this is the issue).

Speculatively thinking through available options assuming this is it -

you say shutting firestore down and bringing it up again as foreground state changes is not well supported. How could it fail and how would I mitigate, if this was it and I tried this?

The supported way to shutdown Firestore is to delete the Firebase App instance (js, swift).

Firestore responds to the internal event by doing nothing.

We do have an unpublished API to do what's actually required. If you forward declare our shutdown method you'll be able to invoke it.

Note that both are required. So yeah, not well supported.

if the assertion not many people generate 100MB of data (requiring GC) is true, and we could successfully assert even fewer generate that much data in the background, perhaps simply turning off GC in the background is a band-aid. Or as you say, bailing on a no-op GC. Not real solutions if this is it, but possibly 95/5 and worth it. Just brainstorming there.

https://github.com/firebase/firebase-ios-sdk/pull/2862 makes it such that you can opt out of GC and have that actually disable the background process. You'll have unbounded cache growth while we work on making this better, but it will give you an effective remedy in the short term.

mikehardy commented 5 years ago

I'm nearly done implementing Fastlane to support multiple release streams so I don't blow up my production firebase data with testing, which will allow me to work this actively but have no test results as of yet.

I asked my primary iOS tester (we have a crash from him) but he has never noticed, so +1 for the "quiet death in background" confirmation bias count.

Shutdown of the Firebase App instance sounds like a last resort, to me. Unfortunately delete is not supported by the react-native-firebase module. Does it have an equivalent android API? If so perhaps I could give them a PR for that one to correctly cover all wrapped API regardless of this issue.

I reviewed the GC PR #2862 and it seems like I could start using unlimited cache now, then when the next firestore iOS version bump happened, I would get the upgraded behavior? With the caveat that it appears react-native-firebase (a completely unaffiliated project I understand, though friendly with you guys if I understand further) does not allow this setting through? https://rnfirebase.io/docs/v5.x.x/firestore/reference/firestore#settings / https://rnfirebase.io/docs/v5.x.x/firestore/reference/Settings - if that's the right way to set this, I could maybe PR that one to rn-firebase as well. Further, if Firestore is dynamically configurable, perhaps I could hook react-native's foreground transition events to dynamically enable/disable cache. That seems like it would be simple and effective.

I won't contemplate more until I've done real testing to make sure we on the right path. Thanks again for the attention on the issue

mikehardy commented 5 years ago

This is difficult to reproduce and verify.

I believe I reproduced it once with the screen lock / timing trick you suggested and received a firebase crash email (as expected given my user role config) with the expected code file (the assert one) but the crashlytics console did not display the error even though my crash-free percentage dropped, and it shows that there is one crash logged for a UUID that I manually uploaded the dSYM for so it should have been fine 🤷‍♂️

I'm going to count that as a confirm though, seeing the assert pop up with the expected test scenario, when I've never seen that assert otherwise in my app, seems about right. All the rest of this is based on the assumption my reproduction was valid and this is a data protection related issue

My data protection is definitely set to complete now (aggressive perhaps, but I submit should be foreseen by a library that does not have control of library consumer habits).

I can certainly understand why that is problematic for an SDK that is can be used to do synchronization in the background possibly, even if it weren't a nop GC.

Seems like this might be more of a docs issue more than anything, at least as a workaround for now. To document that complete data protection at minimum will lead to background crash, and the other protection levels are as yet untested.

Out of the various levels of data protection available it seems like this might describe them best?

complete - you will crash in background in leveldb gc. But, the library should not do no-op work, and should bubble the error up with a message consumable by devs at least, possibly by checking prior to touching disk rather than subscribing https://developer.apple.com/documentation/uikit/uiapplication/1622925-protecteddataavailable
complete unless open - this seems like it should work fine at the library level? after initialization the library should have handles and work fine? Seems iffy now though as a log number of "15" indicates files could roll rapidly as these were first-use type scenarios
complete until first authentication - this seems like it should work fine as well. I'm going to back down to this level as an attempted workaround and I expect it to get rid of the crash without increasing security problems too much

In all cases leveldb should at minimum check to see if it took an EPERM because the files went away with a UIApplication.protectedDataExists check and produce some messaging that describes the issue.

After thought and this reproduction I simply can't imagine how periodic GC wouldn't always fail on a "protection complete" app but I had to make that choice specifically, perhaps I really am just blazing a trail here and the type of person that toggles encryption just because it's available is rare

I'm going to back down to complete until first authentication and hopefully that does the trick

mikehardy commented 5 years ago

Tiny status report: my builds with provisioning profile entitlement / app config data protection diluted to "complete until first authentication" / "NSFileProtectionCompleteUntilFirstUserAuthentication" are starting to work their way through test on their way to deployment, and our MVP is about to go 4x on userbase so I should have more information soon. This setting is the default, so if it works that could explain rarity of crash vs obviousness of crash. Probably a week or two more

tecbot commented 5 years ago

Hi, we have also released an app (with flutter) and we could see the same crashes for some users. As far as I can tell, it happens only when the user has opened the app and locks the phone. If he unlocks the phone after a few minutes and the app was previously open, it crashes with the exception. But it does not always happen, but when it happens, these were the steps to reproduce.

Fatal Exception: NSInternalInconsistencyException
FIRESTORE INTERNAL ASSERTION FAILED: Failed to commit transaction: <LevelDbTransaction Collect garbage: 0 changes (0 bytes):> Failed: IO error: /var/mobile/Containers/Data/Application/E1CB5F5A-1AF7-435D-B0EE-BE573C4F59F0/Documents/firestore/__FIRAPP_DEFAULT/av-fasting/main/000191.log: Operation not permitted (expected status.ok())

Fatal Exception: NSInternalInconsistencyException
0  CoreFoundation                 0x213b88518 __exceptionPreprocess
1  libobjc.A.dylib                0x212d639f8 objc_exception_throw
2  CoreFoundation                 0x213aa2148 +[_CFXNotificationTokenRegistration keyCallbacks]
3  Foundation                     0x2145b1f5c -[NSAssertionHandler handleFailureInFunction:file:lineNumber:description:]
4  Runner                         0x102df0e88 firebase::firestore::util::internal::Fail(char const*, char const*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (hard_assert_apple.mm:34)
5  Runner                         0x102df0fa0 firebase::firestore::util::internal::Fail(char const*, char const*, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*) (hard_assert_apple.mm:54)
6  Runner                         0x102dfdd2c firebase::firestore::local::LevelDbTransaction::Commit() (leveldb_transaction.cc:211)
7  Runner                         0x102da4e98 -[FSTLevelDB commitTransaction] (FSTLevelDB.mm:505)
8  Runner                         0x102daec38 -[FSTLocalStore collectGarbage:] (FSTPersistence.h:216)
9  Runner                         0x102d9be48 std::__1::__function::__func<-[FSTFirestoreClient scheduleLruGarbageCollection]::$_3, std::__1::allocator<-[FSTFirestoreClient scheduleLruGarbageCollection]::$_3>, void ()>::operator()() (FSTFirestoreClient.mm:267)
10 Runner                         0x102d65f4c firebase::firestore::util::AsyncQueue::ExecuteBlocking(std::__1::function<void ()> const&) (atomic:921)
11 Runner                         0x102d7226c firebase::firestore::util::TimeSlot::InvokedByLibdispatch(void*) (executor_libdispatch.mm:179)
12 libdispatch.dylib              0x2135c97d4 _dispatch_client_callout
13 libdispatch.dylib              0x2135a0c34 _dispatch_continuation_pop$VARIANT$armv81
14 libdispatch.dylib              0x2135b0800 _dispatch_source_invoke$VARIANT$armv81
15 libdispatch.dylib              0x2135a4cc0 _dispatch_lane_serial_drain$VARIANT$armv81
16 libdispatch.dylib              0x2135a592c _dispatch_lane_invoke$VARIANT$armv81
17 libdispatch.dylib              0x2135ade08 _dispatch_workloop_worker_thread
18 libsystem_pthread.dylib        0x2137a9114 _pthread_wqthread
19 libsystem_pthread.dylib        0x2137abcd4 start_wqthread

- FirebaseFirestore (1.2.1):
    - FirebaseAuthInterop (~> 1.0)
    - FirebaseCore (~> 5.2)
    - FirebaseFirestore/abseil-cpp (= 1.2.1)
    - "gRPC-C++ (= 0.0.6)"
    - leveldb-library (~> 1.20)
    - nanopb (~> 0.3.901)
    - Protobuf (~> 3.1)
  - FirebaseFirestore/abseil-cpp (1.2.1):
    - FirebaseAuthInterop (~> 1.0)
    - FirebaseCore (~> 5.2)
    - "gRPC-C++ (= 0.0.6)"
    - leveldb-library (~> 1.20)
    - nanopb (~> 0.3.901)
    - Protobuf (~> 3.1)

mikehardy commented 5 years ago

@tecbot can you share what your data protection level is set to in the entitlements for your provisioning profile? is it NSFileProtectionComplete? Or one of the other choices? (or none)

tecbot commented 5 years ago

@mikehardy it's NSFileProtectionComplete

wilhuff commented 5 years ago

The assertion is caused by background activity in Firestore. It assumes that when the user is idle it can perform some cleanup, but it's not prepared for filesystem protection to kick in.

There are two possible workarounds:

Disable persistence altogether
Disable garbage collection

The former is easier to do, but reduces functionality. The latter requires an as yet unreleased fix.

To disable persistence, add the following to your code when first configuring Firestore:

let settings = FirestoreSettings()
settings.isPersistenceEnabled = false

let firestore = Firestore.firestore()
firestore.settings = settings

To disable garbage collection, add the following to your Podfile:

pod 'FirebaseCore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'pull/2862/head'
pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'pull/2862/head'

Note that you may have other entries in there for Firebase/Firestore or Firebase/Core. Those should remain unchanged.

Add the following setting to your code when first configuring Firestore:

let settings = FirestoreSettings()
settings.cacheSizeBytes = FirestoreCacheSizeUnlimited

let firestore = Firestore.firestore()
firestore.settings = settings

mikehardy commented 5 years ago

I believe there is a 3rd workaround - to disable Data Protection, and a possible 4th which is to simply loosen Data Protection so file access is available after first unlock. The 4th approach there is what I'm pursuing now but I don't have significant test results yet

Am I wrong in thinking these other 2 paths are viable, if the app's security profile can tolerate them?

wilhuff commented 5 years ago

Sorry, as you mentioned, notching back to NSFileProtectionCompleteUntilFirstUserAuthentication instead of NSFileProtectionComplete should also work.

Other options are possible. For example if other data in your application is sensitive but your Firestore data is not, you could disable protection for just Firestore files and retain NSFileProtectionComplete for your other data.

The two workarounds cited above are for the case where you must retain NSFileProtectionComplete even for your Firestore data.

wilhuff commented 5 years ago

Also, apologies for not responding to your earlier note. Things have been a bit hectic and I didn't have too much to add to your analysis.

LevelDB, unfortunately, is agnostic to this kind of thing. File protection is incompatible with the kinds of guarantees it tries to make: namely that when a commit returns, data has been appending to the log and is available for querying. If file protection is enabled, a commit must fail.

NSFileProtectionCompleteUnlessOpen could possibly help, depending on LevelDB's response to a compaction failure. LevelDB is based on a log structured merge tree, and any time it performs a compaction, it writes to new files. So if GC writes unconditionally append to the existing log and compaction failures don't block additional writes, this could work, but it depends on a behavior path I wouldn't be comfortable relying upon without significant testing.

The right fix for this for the SDK to subscribe to file protection events and either close its leveldb files or inhibit tasks that would access them.

All of our internal processing is funneled through a dispatch queue, so an appealing course would be to insert a task into the queue that blocked until the filesystem was unlocked. This would have the benefit that you wouldn't even have to stop listening or stop writing--everything would queue in RAM. On the flip side I'm not sure this is a benefit.

Anyway, at the very least, it does seem that we need to document that we're currently incompatible with NSFileProtectionComplete.

ferrerod commented 5 years ago

@wilhuff @mikehardy this thread is super relevant to me on two different iOS apps using Firebase.

&tldr: It is now obvious that the crashes I've been facing in two apps is incompatibility of Firebase with Data Protection with NSFileProtectionComplete.

I had checked back in late March(?) just after releasing both apps, enabling Data Protection with NSFileProtectionComplete. That's when the crashing began, all in the GAI* code (see one such stack below). I did not see any posts about problems.

As @mikehardy stated, this is very difficult to reproduce or verify any fixes outside of "production". I believe this is because iOS Data Protection doesn't get enabled in a Dev/Debug environment.

Several weeks back, I tried to turn off the dispatcher with code like this in AppDelegate.swift:

    func applicationWillResignActive(_ application: UIApplication) {
        guard let gai = GAI.sharedInstance() else { return }
        gai.dispatchInterval = 0.0 // disable when going to background
    }

   func applicationWillEnterForeground(_ application: UIApplication) {
        guard let gai = GAI.sharedInstance() else { return }
        gai.dispatchInterval = 120.0 // enable when app is active
    }

This reduced the number of crashes in both apps significantly but still, the GAI* crashes continued.

I was debating trying NSFileProtectionCompleteUnlessOpen as a second fix attempt but I believe the more reliable solution may be to try disable protection for just Firestore files - since our analytics data does not contain sensitive information at this time.

FWIW, there were several variations of the GAIThread crash but here is one:

Crashed: GAIThread 0 libsqlite3.dylib 0x1f9c82250 sqlite3_finalize + 24708 1 libsqlite3.dylib 0x1f9c80b2c sqlite3_finalize + 18784 2 libsqlite3.dylib 0x1f9c803f8 sqlite3_finalize + 16940 3 libsqlite3.dylib 0x1f9c7e8cc sqlite3_finalize + 9984 4 libsqlite3.dylib 0x1f9c7dd0c sqlite3_finalize + 6976 5 libsqlite3.dylib 0x1f9c9d0c8 sqlite3_step + 13664 6 libsqlite3.dylib 0x1f9c99d24 sqlite3_step + 444 7 CoreData 0x1fc4595e4 _execute + 156 8 CoreData 0x1fc2d0f98 -[NSSQLiteConnection beginTransaction] + 216 9 CoreData 0x1fc519b38 _executeObjectIDRequest + 152 10 CoreData 0x1fc2c10bc -[NSSQLObjectIDRequestContext executeRequestCore:] + 20 11 CoreData 0x1fc2b944c -[NSSQLStoreRequestContext executeRequestUsingConnection:] + 252 12 CoreData 0x1fc419074 52-[NSSQLDefaultConnectionManager handleStoreRequest:]_block_invoke + 84 13 libdispatch.dylib 0x1f92a57d4 _dispatch_client_callout + 16 14 libdispatch.dylib 0x1f92865fc _dispatch_lane_barrier_sync_invoke_and_complete + 56 15 CoreData 0x1fc2b92b4 -[NSSQLDefaultConnectionManager handleStoreRequest:] + 264 16 CoreData 0x1fc2b9168 -[NSSQLCoreDispatchManager routeStoreRequest:] + 264 17 CoreData 0x1fc2b8f24 -[NSSQLCore dispatchRequest:withRetries:] + 260 18 CoreData 0x1fc2c0a3c -[NSSQLCore obtainPermanentIDsForObjects:error:] + 1744 19 CoreData 0x1fc426190 87-[NSPersistentStoreCoordinator(_NSInternalMethods) obtainPermanentIDsForObjects:error:]_block_invoke + 764 20 CoreData 0x1fc2b8d14 -[NSPersistentStoreCoordinator _routeHeavyweightBlock:] + 240 21 CoreData 0x1fc2c5a2c -[NSPersistentStoreCoordinator(_NSInternalMethods) obtainPermanentIDsForObjects:error:] + 652 22 CoreData 0x1fc2c5d24 -[NSManagedObjectContext obtainPermanentIDsForObjects:error:] + 540 23 XXXXXXXXX 0x10225cfd4 __23-[GAIDataStore addHit:]_block_invoke (GAIDataStore.m:406) 24 XXXXXXXXX 0x10225b7dc -[GAIDataStore performBlockAndWait:withError:] (GAIDataStore.m:198) 25 XXXXXXXXX 0x10225ce50 -[GAIDataStore addHit:] (GAIDataStore.m:397) 26 XXXXXXXXX 0x10226aae4 -[GAIBatchingDispatcher persist:] (GAIBatchingDispatcher.m:515) 27 XXXXXXXXX 0x10226c0c4 -[GAIBatchingDispatcher queueDispatch:] (GAIBatchingDispatcher.m:707) 28 XXXXXXXXX 0x10226bdf4 -[GAIBatchingDispatcher queueModel:] (GAIBatchingDispatcher.m:671)

wilhuff commented 5 years ago

@ferrerod The crashes you're seeing have nothing to do with FIrestore--GAI is Analytics. Changing protections on Firestore files aren't going to have any affect on this. You may be able to apply a similar technique to Analytics' files though I don't know what those are.

There is commonality in that file protection is involved, but the details are different and aside from the discussion of different file protection levels none of the workarounds discussed here apply.

I suggest opening a separate issue for the GAI-related issues you're seeing.

ferrerod commented 5 years ago

@wilhuff do you know where I might open an issue? There doesn't seem to be the same github.com for GoogleAnalytics

wilhuff commented 5 years ago

The best resource I could find about this is their Getting Help page. I'm not aware of any other path.

mikehardy commented 5 years ago

I now have about 40 users on my new release and 40 that were active on the older one over the last 7 days. Crash count is 12 crashes on the old release with this stack trace, 0 crashes on the new one. That was changing file protection to NSFileProtectionCompleteUntilFirstUserAuthentication - I did not try the other intermediate file protection steps as I couldn't reason through how they would avoid the problem, so I can't say if the will work or not. But this appears to work in my small but decisive sample. I know you guys are aware of this, and the issue is essentially resolved for me at this point, so I am fine with however you want to manage the github issue state.

Cheers!

wilhuff commented 5 years ago

That's great news.

In that case, I'm going to resolve this issue by noting that Firestore is not currently compatible with NSFileProtectionComplete.

I've opened #2980 to track the feature request of making it so we don't crash by default in this mode, though there's still not much utility to Firestore in this mode, since we can't persist any writes submitted while file protection is active.

wilhuff commented 5 years ago

I've just read that changing the com.apple.developer.default-data-protection entitlement may not be sufficient to fix existing users that are broken.

The issue is that the entitlement sets the default file protection when your application container is created on the user's device. Changing it subsequently won't change the file protection type of any existing files. That means that Firestore files created with the version of your app that was using NSFileProtectionComplete may still be marked that way and will still cause crashes in the background.

If you're still seeing crashes, those users either need to delete and reinstall your app or you may need to apply a fix like the one I described above to force existing files to NSFileProtectionCompleteUntilFirstUserAuthentication.

mikehardy commented 5 years ago

Very interesting...given the crash mode is a silent background one and no users reported it me personally, it was all crashlytics reports, and also given I personally (may not apply to other affected parties) was in the equivalent of an alpha release when we saw, and fixed prior to the equivalent of alpha2, I'm personally okay. But that's a good heads-up + pointer for any others, and knowledge I will file away also.

spaquet commented 5 years ago

do we have updates about this?

mikehardy commented 5 years ago

I have no further crashes since my previous comment, when using NSFileProtectionCompleteUntilFirstUserAuthentication

mechmatt01 commented 4 years ago

@mikehardy I have run into this issue as well but upon trying to change the NSProtection level I get an error from Xcode claiming that it can not update the Entitlements file. So I change it myself to the new protection level and then it comes back with another error saying that it is incorrectly set. Have you or anyone else experienced this? If so is there a way around this? I have tried it multiple times with no luck. I am using Xcode 10.3.

mikehardy commented 4 years ago

entitlements are (from what I know) intertwined closely with the apps metadata stored in your developer account

I believe you need to change them in the apple developer connection, then download the new entitlements file and use it. Manually changing that file is not something I think is possible

mechmatt01 commented 4 years ago

@mikehardy correct but when doing that Xcode does not seem to update the entitlements from the changes made in portal.

mikehardy commented 4 years ago

Hmm - I'm clearly out of my depth here - on re-read of the docs it seems like you need to change them in your developer connection account and in Xcode (separately?) but it's been so long since I did this I don't remember completely. Sorry. My related commit message when I conformed to the background encryption setting needs here was even more ominous, as it mentions you need to update your provisioning profile as well? For me that spiders out into fastlane proviosioning profile twiddling which is quite the yak shave. All vague, but maybe it gets you pointed in the right direction.

commit 2738cc6cf6876694e7cec08fdf0344f53607e795
Author: Mike Hardy <>
Date:   Mon Apr 22 11:19:56 2019 -0500

    attempt to fix iOS Firestore gc crash by loosening encryption

    Note this blows up iOS deployment until provisioning profile is altered

pkrmf commented 4 years ago

I am not sure if Firestore generates any files, but if it does, you could always change the attributes of those files to no protection and that should fix the crash too.

I ran into a very similar issue, I opened a radar on Apple today since its clearly an iOS 13 bug in my case: From my radar/Feedback:

An application with data protection entitlement enabled on iOS13 will crash when a user with passcode enabled locks the device while the app is running. The application must be signed, otherwise the crash is unreproducible.
If the user disables passcode and locks the device, no issues occur.
If the application isn't signed, no issues will occur.

We reproduced this crash with our enterprise application and also with a demo app with no code in it.

As par of our troubleshooting, we realized that dyld3 is storing .closure files in the /tmp folder of the application. If we change the attributes of those files to FileAttributeKey.protectionKey: FileProtectionType.none or we just remove them after the application is launched, no crash will occur. The default attribute for FileAttributeKey.protectionKey will be FileProtectionType.complete due to the data protection entitlement.

We believe there is some sort of issue with dyld3 trying to access those files in the /tmp directory when the device is locked, and it causes the crash.

Just to be clear, to change the attributes of those files, you most likely will also need to change the posix permissions to 0o777 since most likely they are set to immutable and changing protection permissions won't have any effect.

wilhuff commented 4 years ago

@pkrmf This is essentially the advice I gave in https://github.com/firebase/firebase-ios-sdk/issues/2846#issuecomment-484291699. Unfortunately, as a library we can't really make this choice for developers using Firestore. If the data really is sensitive, Firestore auto-unlocking its database files will work counter to developer intent.

Since this issue occurred we have done a few things to make it easier to use Firestore in an app that must use NSFileProtectionComplete. In particular, it's now possible to cleanly shut Firestore down as your application is going into the background. Do this by stopping your use of Firestore, calling waitForPendingWrites and then in the callback, call terminate. Just calling terminate is also safe--any writes that were in progress will be resumed on the next restart.

firebase / firebase-ios-sdk