FR: limit cost for reading unchanged data

Pitouli commented 4 years ago

Hello,

Scenarios were data do not change a lot are common cases, and Firebase is very expensive in these situations. I understand that an "out of the box" solution is very difficult to implement (from your answer here: https://github.com/firebase/firebase-js-sdk/issues/3422)

In May, I was requesting the addition of a "source" parameter for the onSnapshot that would give devs the opportunity to optimize themselves the requests: https://github.com/firebase/firebase-js-sdk/issues/3040

Do you have more insights on this?

Thanks

schmidt-sebastian commented 4 years ago

@Pitouli Thanks for chiming in.

I will bring #3040 back up with the team again, but it would still help us to know if there is more user demand. Are you aware of other developers that are running into this billing limitations? We certainly don't need an online petition, but it would be good to have some more feedback before we spend a couple weeks of engineering resources on implementing this feature across all of our platforms.

You are likely aware that you can achieve some of the cost savings you envision by querying documents that have changes since a timestamp, which would ensure that documents that haven't changed are excluded from transfer even if the query hasn't been listened to for more than 30 minutes. Since this doesn't handle all use cases (e.g. deletes or collections with a high write rate), this is not something we can do on behalf of all of our users, but it is something that may work on a case by case basis.

Pitouli commented 4 years ago

Hello @schmidt-sebastian, thanks for your answer :)

Are you aware of other developers that are running into this billing limitations?

Concerning the user demand for methods to reduce the reading cost of cached data, it's hard to me to say since no one is asking me, but I can found some examples on the internet:

https://medium.com/firebase-tips-tricks/how-to-drastically-reduce-the-number-of-reads-when-no-documents-are-changed-in-firestore-8760e2f25e9e -> 462 claps + 222 claps on a repost (not mines of course)

Other tickets https://github.com/firebase/firebase-js-sdk/issues/471 https://github.com/firebase/firebase-js-sdk/issues/3422

Questions about how to reuse cache to limit cost: https://stackoverflow.com/questions/52700648/how-to-avoid-unnecessary-firestore-reads-with-cache And few other there: https://stackoverflow.com/search?page=1&tab=Relevance&q=firestore%20read%20cache

Here we have people who are asking for the same thing than me (onSnapshot from cache): https://stackoverflow.com/questions/63712571/force-stream-based-on-firestore-snapshots-to-read-cache https://stackoverflow.com/questions/60374413/reduce-firebase-cloud-firestore-reads-by-using-cache

it would be good to have some more feedback before we spend a couple weeks of engineering resources on implementing this feature across all of our platforms

I can completely understand this, especially if it involves lot of rework.

From my perspective -- which obviously is very "naive" -- my request is "only" to make available something that I believe you use internally (for offline case) and therefore harmonize get() and onSnapshot().

On the contrary, I understand that making a perfect out-of-the-box "onSnapshot" that intelligently and automatically merge cache and server data without re-reading data already there is highly complicated (if not impossible) to make it work in all cases for all the reasons you shared (would require adding metadata, how to handle deletions, etc.)

Of course, I could be underevaluating the difficulty of my "simple" request, and perhaps it also raises complicated questions I have not anticipated.

You are likely aware that you can achieve some of the cost savings you envision by querying documents that have changes since a timestamp

In my case where each user as its "private" list of items (between 10 and 1000 items approximately) which can only be modified by the user itself, I see 3 solutions to not re-read cached data, considering that I have a timestamps which tells when the last modification has occurred:

I get from cache if my local "last modification timestamp" is the same than the server one. Otherwise, I resync everything from the server.
- Pros: easy to do the initial load
- Cons: when the user uses the app, add, update or remove items, the list is no more reactive since the "get" is not a "listener". So I have to re-get from cache at every change. Performance wise, I think it can be quite costly, and means a big rework to detect the changes and redo the get.
I get from cache all my cache and find the most recent "update_timestamp" to determine when was my last sync. And I do a "onSnapshot" of all the items modified since.
- Pros: the most cost effective solution (I could almost run eternally without resyncing)
- Cons: I have to take care of intelligently merging the two lists by replacing the correct item when I receive an update (and I cannot count on the "index" metadata). I cannot delete items, I have to update them with a "deleted" flag instead so they trigger the listener and I can remove them from the merged list. So occasionally I must run a batch to remove all "deleted" flagged items, and re-sync all the cache with server (for example by saving the date of last "batch deletion").
I call a onSnapshot sourced from the cache if my local "last modification timestamp" is the same than the server one. Otherwise I resync everything from the server.
- Pros: easy to implement, it works exactly like offline (contrary to solution 1, the update are automatically taken care of by the no-latency feature)
- Cons: like solution 1, if I detect an update has been made from another device, I have to re-sync everything.

The 3rd solution is in my opinion a good balance between "cost efficiency" and "easyness to implement". But it requires the "listen from the cache" functionality.

schmidt-sebastian commented 4 years ago

@Pitouli Thank you for this very thorough and reasonable response. I will talk to my team about this and get back to you either this or next week.

schmidt-sebastian commented 4 years ago

I talked to my team about this today. While we can't make a firm commitment, if we do have some spare cycles this is something we would like to work on during the remainder of the year. This is rough for me to translate to firm release timelines, but we will post updates here if we can narrow done when this might be released.

Benny739 commented 4 years ago

Just found this thread, this would also be very valuable for us. We're using a lot of workarounds, to prevent firestore costs from exploding. If we could just sync changed data and pay for reads of the changed data, firestore would become way more flexibel and easy to use.

wilhuff commented 4 years ago

@Benny739 note that this is a discussion of how to create snapshot listeners that only read from cached data, avoiding reads from the server altogether. Perfect incremental sync is not possible today and this proposed API change doesn't change that.

Sesa1988 commented 3 years ago

What if I call first a method only to get the documents from Firestore with a greater timestamp compared to the latest timestamp from my cached items.

Future _updateCache(AppUser user) async {
    var cachedSnapshot = await _getCachedPortfolios(user);
    var lastUpdatedCached = _getLastUpdatedCached(cachedSnapshot);

    await _firestore
        .collection('portfolios')
        .where('uid', isEqualTo: user.id)
        .where('updatedAt', isGreaterThan: lastUpdatedCached)
        .get();

    return lastUpdatedCached;
  }

After that I could just get all items from cache

var cachedsnapshot = await _firestore
        .collection('portfolios')
        .where('uid', isEqualTo: user.id)
        .orderBy('updatedAt', descending: true)
        .get(GetOptions(source: Source.cache));
    return cachedsnapshot;

Im note sure if this would break at some point if the cache gets to big or can't clear because everything "is in use" whatever this means for firebase

dconeybe commented 1 year ago

For the record, here is another customer request for this feature (i.e. Source.CACHE for onSnapshot): https://github.com/firebase/firebase-js-sdk/issues/6880

michalkubizna commented 1 year ago

Hi, is this feature planned?

cedvdb commented 1 year ago

@Pitouli You wrote that the problems with querying just for last updated documents is that you won't receive deleted documents.

Assuming there is some metadata on the firestore side about a document, this begs the question why not keep this metadata alive after deletion, with the added properties: deleted and deletedAt. Since storing additional stuff is cheaper than reading. This could even be opt-in.

Without knowing the internals, the deletion problem making this feature request a dead end seems questionable.

Reminder to use the vote button, as the upvote count is still quite low, unfortunately.

firebase / firebase-js-sdk

FR: limit cost for reading unchanged data #3789