firebase / firebase-js-sdk

Firebase Javascript SDK
https://firebase.google.com/docs/web/setup
Other
4.85k stars 892 forks source link

FR: Offline first support for PWAs (RTDB) #17

Open Hollerweger opened 7 years ago

Hollerweger commented 7 years ago

While the Firebase JS SDK has support for offline scenarios when the web app goes from online to offline it lacks offline first support. Offline first is a crucial part of PWAs and should be supported by the Firebase JS SDK directly.

mikelehen commented 7 years ago

@jsayol Yep! I think that kind of size approximator would be perfect... and using internal APIs where available is also a great idea. :-)

mikelehen commented 7 years ago

@jsayol As for the storage format, it'd be interesting to do some performance profiling and see how things look. I believe Chrome uses LevelDB for its IndexedDB storage under the covers which may make it similar to LevelDB on iOS, depending on how much extra overhead there is. Other browsers (and ReactNative AsyncStorage) may have a much harder time with lots of small rows, but I don't know!

One other big question. What are doing (or planning to do) about multi-tab access? That was always one of the big concerns with tackling web offline... If you have multiple tabs modifying the offline cached data, you can easily end up with data corruption / inconsistencies. iOS and Android don't have this problem since apps are single-instanced on mobile.

The easy / safe option is to detect and prevent multi-tab access so that it's not a concern, but obviously that makes it harder to build an ideal UX for web apps. It would probably be 100% fine for React Native, PWAs, etc. though.

jsayol commented 7 years ago

One other big question. What are doing (or planning to do) about multi-tab access?

@mikelehen Ah, good point. Hadn't thought about that. I think for now the best option is to prevent multi-tab access and revisit that decision in the future.

Reading data from multiple tabs is not an issue, so we would only need to lock writes to a single tab (whichever requests access first, I'll look into how that can be implemented). With that approach, the worst-case scenario is that some data won't be persistence in some situations, so basically no worse than it is now, but most of the time it would still be persisted. I'm OK with that.

kr31n3r commented 7 years ago

Regarding initial RNAsyncStorageAdapter testing:

first of all thanks again for that stunning fast response and providing this wrapper around React Native's AsyncStorage. Despite some initial quirks my first tests look promising and right now my current project works with persistence enabled.

now about these quirks:

i'll keep you updated about further testing next week.

kr31n3r commented 7 years ago

More about those quirks: after this small patch everything seems to work as expected :blush:!

in src/database/persistence/ServerCacheStore.ts line 111:

const subpath = item.key.substring(baseKey.length)
                .split('/').filter(part => part.length > 0);

to:

const subpath = item.key.substring(item.key.indexOf(baseKey) + baseKey.length)
                .split('/').filter(part => part.length > 0);
kr31n3r commented 7 years ago

Here's another small one in src/database/persistence/query/TrackedQueryManager.ts hasActiveDefault() line 165:

(map: TrackedQueryMap) => map[Query.DefaultIdentifier].active);

should check for undefined as well

(map: TrackedQueryMap) => {
    const trackedQuery = map[Query.DefaultIdentifier];
    return trackedQuery && trackedQuery.active;
})

This eliminates RN red screen of death undefined is not an object (evaluating 'map[_Query.Query.DefaultIdentifier].active')

jsayol commented 7 years ago

Thanks @kr31n3r! Hopefully I will soon have some time to start writing tests and catch all these little things.

sebasgarcep commented 7 years ago

Hey @jsayol, how is .enablePersistence() going? Right now we are using redux-offline for persistence but your solution might be more useful for us. If you could provide somewhat of a roadmap or a list of things that need to be done before integrating your fork into the main repo we might be able to help and make this a reality!

alexanderwhatley commented 7 years ago

+1

PierBover commented 7 years ago

It appears @jsayol went on a bike trip from Poland to Barcelona for 3 months, so this might take a while...

😄

https://twitter.com/jsayol/status/905564254379630594

jsayol commented 7 years ago

I see news travel fast around here (faster than me on the bike anyway).

Yeah, I was really busy with work recently, which prevented me from spending any time on this, and now I'm taking some time off to travel. I will eventually get back to it, I promise. Sorry if anyone was counting on this being ready sooner.

In the meantime, anyone else could contribute to move this forward faster. I think everything is explained in previous comments but off the top of my head, this is what's pending:

If you do any work towards this then feel free to open a PR against my fork (also linked elsewhere here).

I promise I'll finish this as soon as I get my hands on a laptop but, like I said, it might be a while.

Cheers,

Josep

afeltham commented 7 years ago

Hello. Thanks very much for all this great work. I've tried to pull the branch mentioned above and build it but am having issues. To shortcut my frustration does anyone have any of the built 'dist' folder that they could share with me? Am really interested in trying this. Perhaps @kr31n3r ?

cfilipov commented 7 years ago

Just saw this today.

Enables offline data access via a powerful, on-device database. This local database means your app will function smoothly, even when your users lose connectivity. This offline mode is available on Web, iOS and Android.

It seems Firebase announced another db option separate from the realtime db. I haven't looked into the details yet but it claims to support offline capabilities similar to this issue. Just thought I would drop this here since I've been watching this issue closely and I'm sure others like me might find this option suitable.

jthegedus commented 7 years ago

Firestore Announcement blog post Firestore Docs Firestore and Web introduction video - discusses nearer the end how syncing works in practice (not implementation)

I can confirm that Firestore does indeed support offline mode in Web in addition to the mobile platforms.

cfilipov commented 7 years ago

This is great! Firestore is much more aligned to my needs than a realtime db. I don't mean to get this issue off topic but it's somewhat relevant: how does sync work? Is it similar to the rtdb implementation mentioned earlier in this thread (compound hashes)? OT-based? How are conflicts handled?

Edit: Looks like it's last-write-wins like the realtime db.

jshcrowthe commented 7 years ago

Hey everyone! I see you got to this before I did but just wanted to give another plug for Cloud Firestore.

As has been mentioned, Firestore supports web offline, and it may be a great fit for your application. So please take a look!

That said, I've changed the title of this issue to point at the RTDB as this is still a valid feature request.

jasonrobinson commented 7 years ago

Firestore looks great, but glad you're keeping the feature request active.

update: Fully migrated to Firestore. I started with some analytical collections, not intending to move further. But, like RTDB, the API is clean and clear, so why not... Kudos to the team for a great product.

rockwotj commented 7 years ago

@cfilipov the Firestore web SDK is also open source here: https://github.com/firebase/firebase-js-sdk/tree/master/src/firestore

Most of the offline code is in the local folder. Firestore uses a versioning system instead of hashing to do conflict resolution, which is better and worse in someways to the hashing method.

Most of the offline caching of queries is here: https://github.com/firebase/firebase-js-sdk/blob/master/src/firestore/local/indexeddb_query_cache.ts

We do cache all the documents that are returned here as well: https://github.com/firebase/firebase-js-sdk/blob/master/src/firestore/local/indexeddb_remote_document_cache.ts

Although we do a lot of other indexes we keep for performance reasons and we also persist any writes you do.

jacwright commented 7 years ago

For the record, while I am excited about Firestore for some use-cases, the realtime database is still the best fit for my application (many small updates). I'm still anxiously waiting for this feature to be implemented and wishing I had the time to help move it forward. Sending ya'll who work on it lots of love and good wishes! 😄 Thanks for your hard work.

jacwright commented 6 years ago

I'm using the storage APIs and storage event in https://github.com/dabblewriter/browserdb to help with cross-window (browser tab) updates whether online or off. PouchDB does the same. I think this would be the best option for that.

jacwright commented 6 years ago

Actually I realized that multiple firebase connections writing to one database for my app isn't ideal, so for my app I will be implementing leader-election from the Raft consensus algorithm and only allowing 1 tab to be connected to Firebase at a time.

mesqueeb commented 6 years ago

Is offline first supported for Cloud Firestore? I got lots of errors with my existing setup after adding enableOffline.

rockwotj commented 6 years ago

Yes it is.

enableOffline? I believe the call is enablePersistence, see the docs here

What does the error say? I would suggest opening another issue for this (since this is tracking persistence for the Realtime Database).

paulpv commented 6 years ago

Did @jsayol ever make it to Barcelona?

jsayol commented 6 years ago

Did @jsayol ever make it to Barcelona?

I did.

I've been looking into this again. The internal structure of the SDK has changed quite a bit since last time so I'm slowly solving all the merge conflicts on my local copy to get it working again. Once that's done I'll keep working on it :)

paulpv commented 6 years ago

FWIW, I've moved to Firestore for my PWA, which seems to support offline OK (via firebase.firestore().enablePersistence()), but it throws a lot of console error messages when offline and definitely isn't a good offline first experience.

callagga commented 6 years ago

@paulpv - with firestore do you get the two events back from an onSnapshot query when you set it to includeQueryMetadataChanges? (background: https://groups.google.com/forum/#!topic/google-cloud-firestore-discuss/cLqy_zH2no4)

paulpv commented 6 years ago

@callagga Yes, I do get two onSnapshot when using .onSnapshot({ includeQueryMetadataChanges: true }, (querySnapshot) => { ... }), and one when not using { includeQueryMetadataChanges: true }

PierBover commented 6 years ago

I'm looking into solving offline again for a crossplatform web app (mobile, desktop, Chrome OS).

So, persistence hasn't been solved yet for the RTDB, right?

I also took a look at Firestore but I saw this in the docs:

For the web, offline persistence is an experimental feature that is supported only by the Chrome, Safari, and Firefox web browsers.

Which doesn't inspire much confidence to be honest... and knowing Edge is not supported is a deal breaker for us which represents about 25% of our users.

Is there work being done on offline persistence for Firestore for the web or is this feature going to remain "experimental"?

mikelehen commented 6 years ago

@PierBover Firestore web persistence is definitely under active development. Right now we're focused on implementing multi-tab support. The Edge issue is unfortunate. If Microsoft implements https://developer.microsoft.com/en-us/microsoft-edge/platform/status/indexeddbarraysandmultientrysupport/ then we should be able to support Edge easily (feel free to add some upvotes to their roadmap :-)). Barring that, we're going to have to rework how we store / index our persisted data in order to work around the limitation.

PierBover commented 6 years ago

Thanks for your fast answer @mikelehen !

For my use case multi-tab support is not a priority, but Edge support is.

And what about offline persistence for the RTDB @jsayol ?

jacwright commented 6 years ago

I am also hoping for RTDB persistence. I have a custom solution using IndexedDB and will be using https://github.com/dabblewriter/tab-election for multi-tab support so that only one tab commits updates to the database. You could use that (or something like it) and have a single tab be the one with all the watches. Sorry I can't help contribute more to this!

PierBover commented 6 years ago

Thanks @jacwright my app will not run in multiple tabs. We are working on our own native wrappers using the web engine of each OS (Android, iOS, Mac, Windows).

jacwright commented 6 years ago

I am currently caching data myself and I wonder if we could expose the hashing in Firebase to at least allow it to be taken advantage of outside the persistence feature. I would really like to take advantage of the bandwidth savings and am happy to control the caching myself.

awardrop commented 6 years ago

Folks, I have been looking into this for some time now and the answer is clear but not what we want to hear.

Browsers "parse" JS and thus we rely on browser technology for local storage as we do not store data on disc.

Android / OS compile onto the disc if necessary so the access to disc storage and light databases like sqlite is possible.

Browser local cache or localdb or indexeddb only allow 4mb of data per app before recycling old data... So ... Offline first would mean user cannot create more than 4mb of data. This is not realistic in the long term.

You could choose to only make certain tables offline first.... But even then it's unclear if they might one day loose old data and sync will ruin your cloud store as well

The only option is to make your site into an online portal with more functionality and allow offline for your android app... Both feed the same node API.

I am waiting for a miracle in browsers but don't loose any sleep on it Hope this helps

jacwright commented 6 years ago

Don't lose heart @awardrop!

The browser limit is 50% of remaining disk space (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Browser_storage_limits_and_eviction_criteria) which is much more than 4mb in most cases. And in the cases it is not, you can let the user know to clean up disk space for offline functionality when getting the QuotaExceededError.

In addition, if firebase is ejected because it has been unused for awhile this is certainly ok. The main reasons to cache the data locally are:

  1. increase the speed of startup and data-load
  2. allow for offline usage
  3. decrease bandwidth usage and costs

If you have to fetch data from the server because it has been ejected since last fetch, that is certainly an acceptable situation. Offline storage enhances an app, but the data still exists elsewhere so you're not in trouble if it times out. You're only in trouble if you only store it locally and not anywhere else.

I'm storing all my data locally in indexeddb and syncing it to firebase. I would love to take advantage of the bandwidth savings persistence mode provides, but I realized yesterday that if I timestamp all the saves I can use a query to only get updates newer than last fetch (and I can store the last timestamp in local storage or indexeddb).

I think most people on this thread of come up with workarounds or other solutions for this problem and are past the point where they need persistence. But newcomers will certainly benefit from the feature (and we all will on our next app) so I still feel there is a lot of value to it. I just wish I could commit time to solving it rather than onlyt contributing to the discussion.

PierBover commented 6 years ago

What we ended up doing in previous projects is to abstract the persistent storage medium depending on the platform for Cordova, Electron, or UWP web apps. Basically we ended up saving .json files to disk (which are truly persistent) with the complete Vuex state of our Vue app.

It's rudimentary... but it works. We have a couple thousands users with very bad connectivity that remain offline like 80% of the time.

awardrop commented 6 years ago

@jacwright very true... I didn't mean to sound pessimistic :D.

The only thing I can't wrap my head around is the security layer when working with on disc files @pierbober... Was wondering if you could share some pearls of wisdom down on us mere mortals about how you managed to make the local database on client side secure (given its fully under their control potentially)... It's the last piece of my puzzle

PierBover commented 6 years ago

Was wondering if you could share some pearls of wisdom down on us mere mortals about how you managed to make the local database on client side secure

I think you may be overestimating what we did... I'd like to clarify that we do not have the entire DB there, only the state of our app from Vuex (like Redux for Vue).

On iOS, Android, and Chrome OS the user has no access to the JSON files we are saving.

On Mac and Windows (via Electron or UWP) we simply encrypt the files before writing to disk and decode them in memory.

motin commented 6 years ago

Is this thread about implementing offline persistence or true offline first capabilities? Ie, is the target to be able to add and query items in a locally persisted Firestore-compatible database without requiring any network activity or even a cloud Firestore database setup? With PouchDB, this is completely possible since remote syncing is optional and can be added at a later stage.

mikelehen commented 6 years ago

I believe this was talking about adding offline support to the Realtime Database SDK (not the Cloud Firestore SDK, which already supports web offline).

FWIW- In theory you could use the Cloud Firestore SDK 100% offline by calling firebase.firestore().disableNetwork() but it's not really optimized for this use case right now, so performance may degrade over time.

Ross-Rawlins commented 6 years ago

Is there a solid answer for how to use offline persistance with real time Db yet? I had to upgrade my angular app and the old angularfire2-offline has been depricated. And to move over from RTDb to firestore I cant do just yet.

jacwright commented 6 years ago

I have come up with a system to do it myself adding a modified field on every record. Then using indexeddb and the time offset I keep the local records and the remote records in sync whenever online. This reduces the traffic by only grabbing everything that has been modified since I last asked (I store the last modified timestamp received in an indexeddb store too). It's been working well, but having built-in support like Firestore does would be much nicer. It took me awhile to put this together and puts certain requirements on my data structure.

jsayol commented 6 years ago

Hi everyone. I started looking into this again to see if we can move it forward. I’m gonna put some thoughts into writing here, along with some of the decisions I’ve made so far. Please feel free to comment about any of it and ask any questions you have.

(cc @mikelehen @schmidt-sebastian)

Data storage

There was some concern about how data is stored internally. My current implementation follows the same approach as in the iOS sdk, where an entry is created for each value with its full path as the key. As mentioned, this results in many entries with long keys and usually small values. That's fine in the iOS sdk because it uses LevelDB, which uses prefix compression on the keys.

I've looked into it and both Blink (Chrome, Opera) and WebKit (Safari, Chrome iOS) use LevelDB as the underlying engine for IndexedDB so we're probably good there, although we should still do some profiling to be sure the performance is acceptable. I couldn't find any information about Gecko (Firefox) nor EdgeHTML (Edge).

As for other platforms, like React Native, I think we should offload the decision on how to store it to their specific StorageAdapter implementation. The "core" could still keep using these deep keys while, depending on what's most efficient in that platform, the specific storage implementation might decide to group several entries that share a common prefix into a single value. When retrieving data from storage, the persistence manager already makes a single request to the storage adapter to get all the entries whose keys begin with a certain prefix (the path we want) so it should be fairly trivial to use an alternative approach in each adapter.

That aside, I don't see any easy alternative to the current implementation. Even though IndexedDB is our best option as a default, it's really not ideal to store arbitrary JSON-like data and that imposes certain limitations. We can't just dump the whole thing into a single entry since that would be very inefficient, so data needs to be split somehow. But we don't really know how to split that data other than by its deepest key. In the long run we could implement some heuristic to determine where to split/group (for example, by determining which paths are commonly written to or read from, among other things) but in the meantime our best approach is to just keep using deep keys. Obviously I’m open to any suggestions here, of course.

Multi-tab access

This was another concern back when we were discussing this last year. Since it has already been solved in Firestore, I think the best approach here will be to mirror their solution. To keep things simple, though, I will probably begin by not allowing persistence to be enabled in more than one tab. This can be achieved by using a minimal implementation of what's currently being used in Firestore, using LocalStorage to coordinate between tabs. Once an initial version is stable and working we can look into adding proper multi-tab support later on.

Cache policy

To prevent the cache from growing too big (and thus increasing the chances of the browser nuking all persisted data) I implemented a “Least Recently Used” cache policy, with pruning triggered when the persisted storage reaches a certain size.

With IndexedDB, though, there’s no direct way to obtain the size of the database so I ended up following this approach as an approximation:

I’ll need to do some testing to see if it’s a good approximation or whether it needs some tweaking. Any suggestions as to how to make it more accurate are definitely welcome.

packaging

When I implemented most of this last year I put persistence into its own separate module, in order to limit the impact this change would have on the size of the database bundle. Since the internal structure of the SDK has changed quite a bit these last few months, when I adapted my changes into it I opted to just put it into the database package for now. That also seems to be the approach followed by Firestore, but if you think it would still be a good idea to have it separate (something like @firebase/database-persist) let me know. This can be decided further down the road, though. There’s plenty of work to do before this is of any concern.

mikelehen commented 6 years ago

Hey @jsayol,

Good luck with this!

Data Storage If you wanted to pursue an approach to avoid many entries with small values, you could take a look at what we do on Android. Each row contains a tree node (possibly containing an entire subtree). If the node would be too large, then we split it into multiple rows. This works pretty well, though breaks down if you have large lists of very small nodes. The list will be too big to fit in a single node, but the individual nodes are very small, leading to poor storage efficiency. Anyway, for more details see https://github.com/firebase/firebase-android-sdk/blob/c9f213cac589595580a04a89784a44e0ff19c39b/firebase-database/src/main/java/com/google/firebase/database/android/SqlPersistenceStorageEngine.java#L70 and https://github.com/firebase/firebase-android-sdk/blob/master/firebase-database/src/main/java/com/google/firebase/database/android/SqlPersistenceStorageEngine.java#L842. But you may as well stick with what you're doing initially and see if it's a problem.

Multi-tab access The key to the Firestore approach is a "lock" in IndexedDb. Every tab assigns itself an ID and one tab writes its ID to the "lock" object store in IndexedDb. Then in every subsequent transaction it verifies that it still holds the lock before doing any writes. To guard against crashed tabs holding the lock there's also a timestamp in the lock and other tabs can take over the lock if it's too stale. Even with multi-tab we use a similar approach to nominate one tab as the "primary."

Cache policy That sounds very reasonable. We do something similar in Android: https://github.com/firebase/firebase-android-sdk/blob/c9f213cac589595580a04a89784a44e0ff19c39b/firebase-database/src/main/java/com/google/firebase/database/core/utilities/NodeSizeEstimator.java#L34

jsayol commented 5 years ago

(Update at the bottom)


Hi there.

I just remembered that back when I was implementing this last year I ran into an issue that could become a potential breaking change.

Some background: when a new listener is attached with the current implementation, either via .on() or .once(), then SyncTree.addEventRegistration() returns a list of events to be raised immediately (synchronously) based on the data that can already be found on the in-memory cache.

By adding persistence into the mix, now addEventRegistration() needs to account for the possibility that it might need to access the disk cache, which is asynchronous. This means that now it needs to return a Promise that resolves to that list of events instead.

With that in mind, take the following code from one of the current Transaction tests:

it('New value is immediately visible.', function() {
  const node = getRandomNode() as Reference;
  node.child('foo').transaction(function() {
    return 42;
  });

  let val = null;
  node.child('foo').on('value', function(snap) {
    val = snap.val();
  });
  expect(val).to.equal(42);
});

I've already modified several other tests that were directly inspecting the list of raised internal events synchronously, since that is not a real use case. But this example is different since people's code might be currently relying on this behavior. It's uncommon, but I'd say it's perfectly valid to do that if you know the value you're looking for has already been cached.

If you don't want to introduce a breaking change here, which is perfectly understandable, the only alternative I see would be to revert addEventRegistration() back to synchronously returning the list of events to be raised based only on the in-memory cache, and then asynchronously raise any events based on the disk cache. From a practical point of view, it would behave just as if new data had come from the server after attaching the listener.

Would that be an acceptable solution?


Update: I went ahead and made these changes, since it actually seemed like the only option really. I guess I needed to put it in writing to realize ¯\_(ツ)_/¯

mikelehen commented 5 years ago

Yep! Agreed that's the only option. 👍

alexnu commented 5 years ago

Just wanted to say I'm still hoping for this feature. I'm quite happy with RTDB and don't see any other reason to migrate to Firestore. My only alternative is to implement my own caching system which won't be as good as official support.

@jsayol I'm cheering for you! 👏

alexanderwhatley commented 5 years ago

Any progress on this feature? Would be really useful for my website.

tomlarkworthy commented 2 years ago

@jsayol do you have a copy of what you have so far? I am interested in seeing how much work it is. I would personally skip trying to store the data efficiently in indexDB, for my personal needs very little data beyond the user record needs to be stored, but startup time is paramount. A store ineffecient offline first implementation would be better than nothing!