FR: Offline first support for PWAs (RTDB)

Hollerweger commented 7 years ago

While the Firebase JS SDK has support for offline scenarios when the web app goes from online to offline it lacks offline first support. Offline first is a crucial part of PWAs and should be supported by the Firebase JS SDK directly.

google-oss-bot commented 7 years ago

Hey there! I couldn't figure out what this issue is about, so I've labeled it for a human to triage. Hang tight.

jshcrowthe commented 7 years ago

I have some ideas of what I think you mean by "offline first support," but the term itself is rather vague especially when viewed in the context of the entire JS SDK as each part of firebase (auth, database, storage, messaging) would have different approaches to "offline first support."

That said, I like what this issue calls attention to, and it is something that I'd love to pursue. Can you help me understand what specific things were difficult for you in doing offline first development?

Hollerweger commented 7 years ago

I was thinking about a database persistence similar to what is supported with the Firebase iOS and Androd SDKs available even when the web app is reopened in a new browser tab offline. Right now i need to implement my own offline persistence layer on top of Firebase to support offline first scenarios. There was even a Firebase I/O session last year regarding PWAs and offline first. For this demo Polymer with an index db mirror was used on top because the functionality was not provided by the Firebase JS SDK itself. https://www.youtube.com/watch?v=SobXoh4rb58 With this approach I'm limited on the offline functionality of Polymer without the ability to directly query the Firebase database. Would be great if such an index db mirror could be part of Firebase JS SDK itself.

knpwrs commented 7 years ago

This is something I have talked to Firebase support about in the past and I was actually just about to open my own issue until I saw this one. When I think about Firebase (at least, my usage of Firebase) and offline functionality I think of storage.

I think what would make the most sense would be to refactor the current implementation to support storage adapters. The current implementation could become the default, "in-memory" adapter. Other community-developed or officially supported adapters could be published as well. IndexedDB is an obvious choice, it's what PouchDB uses by default. A less obvious adapter I would like to implement for use in Electron would be a sqlite adapter.

Just spitballing here, but there could also be a proxy adapter to use two adapters together. For instance, I could use the in-memory adapter along with my sqlite adapter for performance purposes.

The Firebase SDKs were only just recently open sourced. Would these types of features be welcome for pull requests?

jthegedus commented 7 years ago

I was looking at achieving this type of functionality with Firebase and Redux-Offline. If the Firebase JS SDK was to be made modular with defaults it seems that it should do so in other layers of the SDK than just storage to achieve all the Offline-first criteria as specified in Redux-Offline EG: like exposing functions for implementing custom reconciliation of optimistic update failures/rollbacks.

TarikHuber commented 7 years ago

Making the data Offline-first available using react-redux and firebase is no big deal. Here is a working example: https://github.com/TarikHuber/react-most-wanted But: the data is offline available only for the client. The firebase database listeners don't know that you already have most of the data in your local storage. It would be great if firebase itself would manage Offline-first. Maybe they could figure out how to then just load the data that is missing in the local storage and not all of it like it's done with a running app that has connection. For example: you loaded 10 tasks in your application and go offline or close the application. After you reconnect firebase uses hes own cache not only to give you the already loaded 10 tasks but to also just load 2 tasks that where added afterwards and edits to the existing 10.

jthegedus commented 7 years ago

It's no big deal except you have to manage Redux in addition to Firebase. You have no control over when Firebase syncs to the server, you're restricted by Firebases local cache limits and persisting the cache isn't trivial. And Redux certainly overlaps with what the Firebase SDK does for offline. All could be mitigated should Firebase support a few modules/adapters for Offline-first in a similar method to how Redux-Offline defines. I'm just suggesting that we use Redux-Offline as a guide for what parts could be made modular.

jshcrowthe commented 7 years ago

@knpwrs I think this is something that we could totally accept as a PR! Love to have your contributions. The notion of different storage adapters is also an interesting idea that I'd love to see more details on.

In addition, I'd encourage everyone, for all Feature Requests, to make sure you are signed up for the Firebase Alpha Program where you can keep up on all the upcoming features and products.

FluorescentHallucinogen commented 7 years ago

@jshcrowthe What exactly features and products Firebase Alpha Program offers at this time? I've completed and submitted the form 3 days ago. I wonder how long to wait the admission into the Alpha Program?

mbleigh commented 7 years ago

We don't disclose what's in the alpha program (that's kinda the point) and I'm not sure how fast processing applications is. Either way remember that the alpha program is for alpha software which will not be recommended to ship in production.

On Sun, May 28, 2017, 11:32 PM Alexey Rodionov notifications@github.com wrote:

@jshcrowthe https://github.com/jshcrowthe Please answer.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/firebase/firebase-js-sdk/issues/17#issuecomment-304583039, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAD_sP1YyEDIRZcTafaPGX7LgTih0p5ks5r-mZkgaJpZM4NftA8 .

jshcrowthe commented 7 years ago

+1 to what @mbleigh said.

Lets try and keep this thread on topic though 😄 . Further questions on the Alpha program would be better directed to our support or discussion channels (link here: https://firebase.google.com/support/)

jsayol commented 7 years ago

Adding some form of local storage in our apps to cache the data retrieved from the Firebase database (like most of us are doing, I guess) works quite well to enable offline use even when cold-starting an app, but the main problem I see with not having persistence built into the SDK like it is in the Android or iOS ones is this: when an app starts, the SDK has no idea what data is stored locally so, when it attaches listeners, the hash field is empty and the server responds with all the data. Every single time. That means that data usage with the JS SDK is significantly higher than with the native ones.

I understand building a reliable and truly cross-browser local storage solution into the Firebase SDK is no easy feat, maybe even impossible, but it doesn't need to be perfect. It just needs to be better than not having it, even if only in some situations. It could be implemented gradually, first for whatever browsers have the best IndexedDB or SQLite support and then slowly with others, if possible.

Another possible solution, albeit a radically different approach to what is being currently used in the other platforms, would be for the SDK user to pass whatever data it has for a certain database location before attaching a listener.

This might be better explained with an example: let's say we already know what the data at /messages contains, because we had it locally stored somehow:

let data = {
  "-Kgx5lyGUg7w9nnNAKss": {
    "from": "Bob",
    "text": "Hey there"
  },
  "-Kgx9en1kPcRyy1uk7j7": {
    "from": "Alice",
    "text": "Sup?"
  }
};

So there would be a way to "bootstrap" the data at that location before attaching a listener, letting the SDK know what we know:

firebase.database().ref('messages').bootstrap(data).on('child_added', snap => { /* */ });

This leaves some open questions, though: should the SDK always accept that data, or should it ignore it when it is positive the data it has is fresh? (maybe because there's already an active listener on that path).

This would only be a temporary solution anyway, since it puts most of the burden on the developer using the library (figuring out how to store the data locally, passing it to the SDK, etc.) Not my favorite approach but it would certainly be a step up.

jsayol commented 7 years ago

Some more thoughts: a possible solution to add local storage support would be to use localForage, maybe wrapping it like ionic-storage is doing. This would allow to use whatever the best solution is in every scenario/browser.

@knpwrs's idea of storage adapters also seems quite interesting. Skimming through the code, it seems like there's already support for in-memory, LocalStorage, and SessionStorage. So it seems it would be a matter of implementing a way to allow the user to provide their own adapter with a compatible API. I would also consider this a temporary solution though, like the bootstrapping I mentioned in my previous comment. [EDIT] I was looking in the wrong place, nevermind what I said here[/EDIT] The ultimate goal should be for the SDK to handle all this for the user, the same way it's done in the native SDKs.

jthegedus commented 7 years ago

Having a storage mechanism like redux-persist would allow for complete browser/native coverage. Then the user would only have to specify an environment flag for the correct storage adapter to be used.

knpwrs commented 7 years ago

@jsayol I've been looking at how PouchDB stores data offline. They've taken the approach of basing their storage around LevelUP. From that point you can plug in different backends such as MemDOWN (in-memory), level.js (IndexedDB), or even something like SQLdown (sqlite3, PostgreSQL, and MySQL). There's even an Abstract LevelDOWN project which can be used to implement compatible backends. Basing storage around LevelUP could be potentially interesting because then we inherit a large offering of various storage backends. By default we could use MemDOWN and offer the ability to use different backends such as level.js.

FluorescentHallucinogen commented 7 years ago

I believe Firabase developers already have a solution. That's why @jshcrowthe suggests to sign up Firebase Alpha Program.

@jshcrowthe, @mbleigh, isn't it?

jshcrowthe commented 7 years ago

@FluorescentHallucinogen, the Alpha Program is a great way to work with developers who are willing to donate their time to help make Firebase an awesome platform. There is really good discussion going on, so the invitation is to make sure we get to work with all of you in that space as well!

jshcrowthe commented 7 years ago

AFAICT this discussion primarily emphasizes two things

Providing offline access to data
Leveraging offline data on boot to prevent unneeded network traffic

The notion of pluggable storage adapters is something that I think is a cool idea and I'd love to see a demo implementation of this in context of the SDK. This could be separate from minimizing the network traffic as the amount of network traffic would be no different than what it is today. Once we had an agreed upon implementation of persistence, reducing network overhead is just the next logical step.

In the iOS SDK (Github Repo: https://github.com/firebase/firebase-ios-sdk) we are synchronizing only the delta between the local device and server state. In principle we could port that same functionality over to web, and then integrate it with the persistence layer discussed above.

@jsayol / @knpwrs I'd love to see a sample implementation of the storage adapters concept, sounds like a solid strategy to allow for flexible browser/environment requirements.

jsayol commented 7 years ago

In the iOS SDK (Github Repo: https://github.com/firebase/firebase-ios-sdk) we are synchronizing only the delta between the local device and server state.

That's very interesting. You mean that if the hashes don't match when attaching a listener, only the difference is synchronized? If so that's pretty cool, and quite different from the web SDK where the whole thing is resent in that situation.

How is it implemented? Do you traverse the tree checking the hashes at every node to figure out what's up to date and what isn't? I'm trying to locate the relevant code in the iOS repo but I can't seem to find it (and not being familiar with ObjC doesn't help either :smile:).

knpwrs commented 7 years ago

@jshcrowthe Time permitting I may be able to get something done. What do you think about utilizing LevelUP as suggested in my previous comment? Obviously assumes a compatible data model. If it's not compatible then we'd need to design our own adapters.

jshcrowthe commented 7 years ago

@knpwrs I looked at LevelUP and it seems like a really solid library, however I don't know that we need all that it provides. With the database already being quite large, adding another large persistence library is probably a hard sell (I just ran LevelUP through a quick webpack build, 103kb min).

Same story goes for something like LocalForage (although this one is admittedly lighter coming in at around ~25kb).

IMO I'd start w/ just the raw primitives until we need the abstraction (we are going to have to build our own abstraction layer already to allow it to be pluggable).

jshcrowthe commented 7 years ago

@jsayol so we currently are using a hash function that can be found here:

https://github.com/firebase/firebase-js-sdk/blob/master/src/database/js-client/core/Repo.js#L204-L236

This hash is a "simple" hash of the data in the node. We then send that hash to the server when we call listen in the PersistentConnection (see https://github.com/firebase/firebase-js-sdk/blob/master/src/database/js-client/core/PersistentConnection.js#L183)

By leveraging "compound" hashing (which is a hash of key ranges instead of the entire node, iOS implementation found here: https://github.com/firebase/firebase-ios-sdk/blob/master/Firebase/Database/Core/FCompoundHash.m) we could minimize traffic over the wire. We would just need to implement the ability to merge the range updates that we receive with what we already have in memory. (see https://github.com/firebase/firebase-ios-sdk/blob/master/Firebase/Database/Core/FRangeMerge.m)

All that said, I think the right first step is to allow for persistent offline through IndexedDB (or an adapter structure), and then work towards this.

jsayol commented 7 years ago

Thanks for the links @jshcrowthe!

I knew about the hash function (a few months ago it took me a while of digging through minified code to figure that one out :P) but I had no idea about the whole compound hashing implementation. I'll definitely look into it!

I agree with you though, none of it will be very useful without persistence so let's focus on that first. I think a solid first approach would be to simply use IndexedDB, since that would cover most use-cases. (Safari's implementation of IndexedDB is known to have issues though, so it might be worth looking into WebSQL too. Maybe. I don't know.) If we also want to support Node.js then we'd have to look into other options too, but having direct access to the file system opens a whole lot of other possibilities there.

You raised a valid point in a previous comment about bundle size. Ideally we'd keep this change as small as possible but if it ends up getting too large for comfort it could just be implemented into its own sub-module, as an optional feature to be added by the user if they want to use persistence. Something like this:

const firebase = require('firebase/app');
require('firebase/database');
require('firebase/db-persistence');

I'll start looking into how IndexedDB could fit in into the current implementation. Off the top of my head, we'd have to build a system to consistently synchronize the contents of the MemoryStorage with what's being persisted, and probably ensure we're not hitting persistence too often during read or write bursts to avoid performance issues. This synchronization could happen after a certain time of inactivity on the database, like for example 10 seconds, with a maximum interval of time between operations to minimize the risk of ending up with stale data in the event that the app would crash or suddenly be shut down somehow.

Thoughts?

P.S.: I still think the storage adapter idea is an interesting one that can be added later, but basic browser persistence should be provided by the SDK out of the box anyway.

PierBover commented 7 years ago

To me the best case scenario for true offline support would be if it was completely transparent. Downloaded data would be available from persistent storage, and new data would be written to persistent storage and synched automatically once the device is online again.

I'd like to add (since I haven't seen this mentioned) that using localStorage on mobile is not very persistent since it can be deleted at any time by the OS. This would be much more inconvenient on Cordova, React Native, NativeScript, apps. Users expect more persistence from an app than a website.

I agree that right approach would be having an API and write adapters on separate modules (official or third party) to reduce bloating on the main SDK.

knpwrs commented 7 years ago

Side thought: how do the iOS and Android SDKs (and this SDK, for that matter) handle transactions when offline?

jshcrowthe commented 7 years ago

@knpwrs That is a great question! Paging @schmidt-sebastian since I don't know off the top of my head.

jsayol commented 7 years ago

As far as I know, the iOS and Android SDKs will keep track and try to complete the transaction even across app restarts. The JavaScript SDK will only keep the transaction alive during the same session, since it doesn't persist the transaction state. This is definitely something that can be improved with this whole persistence "overhaul".

Other than that, they work the same way: when offline, the transaction callback will receive either the latest known value or null if it isn't known, will trigger optimistic updates unless instructed not to do so, and won't trigger the completion callback until the transaction is actually committed by going back online.

Edit: Oops, turns out I was wrong about transactions. See Frank's comment below for an explanation.

puf commented 7 years ago

Transactions are explicitly not persisted to disk. They do not survive app restarts.

This was an explicit decision by the team at the time. It might be good to revisit the discussion at some point. But for a first iteration, I'd recommend aiming for feature parity with iOS and Android and not persisting transactions.

jsayol commented 7 years ago

Ok, I've started looking into this. Some general considerations:

Persistence should be off by default and the user can enable it at will. Same behavior as in iOS & Android, AFAIK.
Both server and user-initiated writes should be persisted.
For user writes:
- All set() and update() operations get persisted.
- transaction() operations are only persisted once acknowledged by the server (committed). [@puf, when you said that transactions are not persisted in iOS/Android I assume you meant the intermediate states, right?]
- Keep track of pending writes in case they fail, so that we can roll back persisted data.

Thoughts so far?

I'll start working on a simple proof of concept implementation to get the ball rolling. If I'm not mistaken the most obvious first "point of attack" seems to be the SyncTree so I'll focus on that. Feel free to offer any comments and suggestions, though :)

P.S.: I'll try to be as independent as possible while working on this to avoid bothering you all too much, but I might ask for some guidance from time to time. Hope that's ok!

puf commented 7 years ago

As far as I know the iOS and Android clients keep two types of data in their disk cache:

Data that was recently listened to.
A queue of all pending write operations, excluding transactions.

Note that (again: as far as I know) the pending writes are not aggregated into the data cache. That only happens when a listener updates it.

But I'd love @schmidt-sebastian or @mikelehen to give their take on this.

jsayol commented 7 years ago

Note that (again: as far as I know) the pending writes are not aggregated into the data cache. That only happens when a listener updates it.

That actually makes a lot of sense, there's no point in persisting data that might never be listened to.

I presume the disk cache for pending writes is to be able to retry them after an app restart, right? That might be an interesting feature to add too, maybe after basic persistence is working.

puf commented 7 years ago

Upon start of the app, the queue of pending writes is read from disk and just becomes the head of the write queue that always exists.

jsayol commented 7 years ago

Alright, I have an initial implementation ready. I will start working on writing tests but I would really appreciate some early feedback :)

I've worked on the assumption that PRs #72 and #66 will be merged, so my code builds on top of those changes. Given this, what would be the best way to create a PR? If I create it against any of the existing branches it will be a nightmare to review only the changes related to persistence.

Let me know how to proceed 🙂

Cheers.

tomByrer commented 7 years ago

Persistence should be off by default

Then persistence should be a separate library; don't want to ship dead code to the client.

jsayol commented 7 years ago

Yeah, I've been giving some thought to that. Once everything is working there should be a discussion about it, but now might be a bit premature. Good point though.

tomByrer commented 7 years ago

@jsayol How is your solution better than using a caching lib like LaddaJS? I was about to use Ladda in just a few minutes until I saw this thread.

Edit: noteworthy difference may be if there is a change notification.

BTW, I requested caching years ago before FireBase got bought by Google; glad you're working on it!

jsayol commented 7 years ago

It's integrated into the SDK. That allows the SDK to know what data is cached, which prevents the server from sending data that's already known. This significantly reduces bandwidth usage in most cases. When using a caching solution external to the SDK, like most of us are doing now, the server will still send all the data back to the client when attaching a listener.

Besides, we shouldn't be putting the burden of implementing their own persistence on the library's users when using the Web SDK, considering the iOS and Android ones already support this feature. The way I've implemented it, the user just needs to run firebase.database().enablePersistence() and forget about it.

Internally it uses IndexedDB by default if available but for more complex use cases, like a Cordova o React Native app, the enablePersistence() method also accepts a storage adapter to use as long as it complies with a specific but simple API. It should be trivial to wrap whatever storage engine you want to use, as long it follows a key-value model.

tomByrer commented 7 years ago

'enablePersistence()' method also accepts a storage adapter

That's a smart idea! Great if you need IE8-9 compatibility, & for build/server-rendering. I've been looking at some newer faster Key-Value engines.

PierBover commented 7 years ago

How is your solution better than using a caching lib like LaddaJS?

Ladda is built around the idea of CRUD, and not really subscriptions and un-subscriptions to realtime data.

Another factor is that promises are one time only. You need callbacks or observers a la RX to get subscriptions working. For example with on(...).

While it's possible to use REST with Firebase, losing realtime would defeat its main purpose, no?

tomByrer commented 7 years ago

@jsayol I tried to gulp build your fork but get a java/closure error so I can't 'compile'. Might have to do something about me on Win10, but I'm from an innocent time back in the 90s when Java was supposed to make code platform-independent. ;)

While it's possible to use REST with Firebase, losing realtime would defeat its main purpose, no?

Well, my immediate purpose is to hack source code from here: https://hnpwa.com I'm just checking to see if FireBase added caching yet; I was expecting it since Google owned it for a while now...

But thinks for pointing the differences out @PierBover!

jsayol commented 7 years ago

A quick overview of what's implemented so far regarding persistence:

Pending user writes: they are persisted until acknowledged/refused by the server. Only set() and update() operations but not transactions(). During initialization, if any persisted pending user writes are found they are retried.
Server cache: any data coming from the server because of a user query is persisted. This cached data, if any is found, is used to immediately notify the user when querying data and also to calculate the appropriate hashes when attaching a listener on the server (just like with the current in-memory cache).
A storage adapter model allowing the user to supply any storage engine to be used for persistence, as long as it complies with a defined API.
An IndexedDB-based storage adapter implementation to be used as the default if none is provided, which will probably be the case most of the time.

Next steps:

Testing (1): first of all I want to get the current tests to pass again. From a quick look at the results I don't think I broke any functionality, but quite a few of the tests are failing due to several changes I had to make to the current code. @jshcrowthe I will probably have to coordinate this with you.
Testing (2): then I'll start writing the test suite for the new features and any other necessary tests.
Investigate implementing compound hashes and range merging in order to avoid receiving all the data for a query when attaching a listener if only parts of that data have changed compared to what we have persisted/in memory.
keepSynced()... maybe.

About that last point: in the JavaScript SDK we can't really implement the keepSynced() functionality, at least not in the same way it works in the iOS and Android ones. In those platforms the SDK has complete control over when and what data to prune from the persisted cache, so it can choose to never prune paths that have been marked to keep (barring the OS randomly deciding to wipe the app's data). That's not the case here since with IndexedDB it's 100% up to the browser to decide when to evict data from the cache.

If we decide to implement this we could force accessing the cached data for the keepSynced paths every so often even if we don't need it. Most IndexedDB implementations follow a LRU cache policy so that would keep that data "fresh" in the eyes of the browser. This would still be a best-effort approach though, since we couldn't guarantee that the data doesn't get evicted if the cache fills up and other parts have been used more recently.

~~In short: I would forget about keepSynced(), at least for now. Let's focus on everything else.~~

Edit: see comments below about possible keepSync support.

kr31n3r commented 7 years ago

thanks @jsayol for the initial heavy lifting on this issue!

i have been waiting for a proper implementation for quite some time and had to built my own workarounds to accomplish at least some kind of offline usage.

after successfully building your fork i quickly realized that "IndexedDB" is not an option for react-native: FIREBASE WARNING: Failed to initialize database persistence. It will be disabled.

it would be great if you could provide an additional (simple?) "AsyncStorage" storage adapter for testing on react-native.

cheers

p.s. i actually just signed in to github to leave this hopefully motivating comment :wink:..

jsayol commented 7 years ago

Hey @kr31n3r, thanks for trying this out. I just wrote a very crude wrapper around React Native's AsyncStorage. I'm planning on publishing a proper implementation once everything's settled, for RN and other platforms too, but this should get you up and running for now.

Just copy this file into your project, import the RNAsyncStorageAdapter object from it, and then enable persistence like this:

firebase.database().enablePersistence(RNAsyncStorageAdapter);

Let me know how it goes for you.

jsayol commented 7 years ago

I'm experimenting with extracting persistence into its own module and I'm seeing pretty good results, with only a small increase in size in the firebase-database.js module.

Here's a handy comparison table with the differences, all the other files remain the same:

| File | No persistence | Own module | Built-in | |----------| :---: | :---: | :---: | | firebase.js | 495.86 KB | **528.03 KB
_+32.17 KB
+6.49%_** | 527.75 KB
_+31.89 KB
+6.43%_ | | firebase-database.js | 264.08 KB | **269.67 KB
_+5.59 KB
+2.12%_** | 295.97 KB
_+31.89 KB
+12.08%_ | | firebase-database-persistence.js | - | **27.79 KB** | - |

It would be included like any other of the firebase services, either by:

import firebase from 'firebase/app';
import 'firebase/database';
import 'firebase/database-persistence';

or by:

<script src="firebase-app.js"></script>
<script src="firebase-database.js"></script>
<script src="firebase-database-persistence.js"></script>

I think this is the way forward, since it introduces very little penalty to users who don't want to use persistence in their apps.

mikelehen commented 7 years ago

@jsayol This is sounding pretty awesome. Quick question about keepSynced() and eviction... I was under the impression that the browser doesn't do any "eviction" more finer-grained than at the database-level (so it would never purge /part/ of the server cache, for instance). Unless you're using multiple databases? So I would expect that the web client could (and ideally should) implement the same garbage collection and keepSynced() code that iOS has. Though this isn't necessary for an initial "proof-of-concept" implementation. But not implementing it probably actually makes it more likely that the browser would purge the entire offline cache, because it could get too big.

Also, FWIW- keepSynced() is actually just a trivial wrapper to keep an active listener attached to the data. It doesn't actually interact with persistence at a low-level, so it'd be trivial to implement on web, even without any of your persistence work. :-)

jsayol commented 7 years ago

I was under the impression that the browser doesn't do any "eviction" more finer-grained than at the database-level (so it would never purge /part/ of the server cache, for instance)

@mikelehen I was just doing some reading about that and it looks like you're absolutely right. When space is running low, the browser will evict all the data for the least recently used origin or API, depending on the browser. But yeah, the whole database is cleared in any case, not just parts of it. Thanks for pointing that out!

I'll start looking into implementing an equivalent to the PruneForest in the iOS SDK.

Determining the size of the database to decide when to prune is going to be an issue though, since there's currently no standard way to do that with IndexedDB. A very naive approach to estimating the size could be to load the whole server cache, stringify it, and check how long it is. I don't like how inefficient that would be though, so I'm certainly open to suggestions there :)

mikelehen commented 7 years ago

Yeah, for what it's worth, an approximation that's off by a factor of 2 or whatever would probably be good enough. It'll still put a bound on the size of the cache, which is the important thing. I'd try to avoid stringifying the /entire/ server cache, but we probably can't get away from walking the entire cache in some form. But we could probably keep a running approximate size without having to materialize a giant string (either stringify and .length each entry as we walk the cache or apply a less-precise heuristic to estimate sizes).

On that note, how exactly are you persisting data in the server cache? (Sorry, I haven't looked at your code; feel free to link me to the relevant bit if you prefer). The iOS client stores a separate LevelDB record for every leaf node in the cache. This results in long keys (like "/chats/xyz12345/messages/abc67890/sender" with small values, but LevelDB does prefix compression that makes that very efficient (common key prefixes are shared from one row to the next so the storage isn't duplicated). I'm not sure if that approach would be as efficient with IndexedDB, so I'm curious what you have chosen. :-)

jshcrowthe commented 7 years ago

@jsayol This is way exciting! I love the idea of extracting this into its own module as that will minimize the impact that this change will have on other users. When you are ready to coordinate on the testing, get a PR open and we can get @mikelehen and @jdimond to help with the validation.

As for storage adapters, we probably want to formalize that interface into a spec that others can use to write storage adapters for the platforms that aren't built in (ReactNative, Cordova, etc, etc). When you are happy with that interface (i.e. https://github.com/jsayol/firebase-js-sdk/blob/db-persistence/src/database/persistence/storage/StorageAdapter.ts#L11-L130), let's iterate on a spec doc that we can publish alongside this.

jsayol commented 7 years ago

The iOS client stores a separate LevelDB record for every leaf node in the cache. This results in long keys (like "/chats/xyz12345/messages/abc67890/sender" with small values

@mikelehen I followed the same approach (the iOS SDK code was a great source of information while I was implementing this so it shouldn't come as a surprise if my solution is similar overall ;)) Now that you bring it up I might look into more performant alternatives, though it needs to account for the fact that IndexedDB won't be the only storage engine used. In certain scenarios the user might choose to use a different one (like we've seen just a couple comments above with React Native's AsyncStorage) so we shouldn't make any assumptions about the underlying engine. Unless we want to prioritize better IndexedDB performance at the expense of any other alternatives, which I guess one could make a case for since it will be the default.

As for approximating the size of the IndexedDB database, we could loop through all the stored values and accumulate a total based on the type of the value. For example:

If it's a number, count a fixed weight for it. Like 4 bytes, for example.
If it's a string, count its length in bytes (taking encodings like UTF-8 or 16 into account)
If it's a boolean, count only a small weight. Maybe half a byte (1/8th seems too optimistic, in general)
If it's an array, loop its elements and apply the same algorithm recursively.
(Also, for each entry we'd add the length of the key to the count)

It's important to note that this check wouldn't be done very often, only after a certain number of data updates from the server. The iOS SDK does it every 1000 server updates or so which is a good starting point, and could be tweaked after some experimenting.

It could also be worth trying if any browser-specific APIs are available to get the size of the database. For example, Chrome has navigator.webkitTemporaryStorage.queryUsageAndQuota(). If that or any other API are available in the platform and work then we can use that number instead, otherwise we keep going with the manual check.

jsayol commented 7 years ago

@jshcrowthe Thanks! :D I'll have to put all this on hold for a few days as I'll be busy with other stuff but I'll start working on the tests next week, or maybe a bit later if I'm still implementing pruning and all that.

As for the StorageAdapter and StorageAdapterWriteBatch interfaces there might still be some changes there so that's very much a work in progress. I've made an effort to document them thoroughly though so it shouldn't take much work to write that spec towards the end.

It might be a good idea to have some sort of brainstorming session somehow to figure out if any other methods might be necessary, even if we don't use them right now. Once all this goes live I'd like to keep changes to those interfaces to a minimum, or ideally none at all.

firebase / firebase-js-sdk

FR: Offline first support for PWAs (RTDB) #17