firebase / FirebaseUI-Flutter

Apache License 2.0
93 stars 81 forks source link

fix(ui_firestore): fixed pagination issue #117

Open a-s-k-u opened 9 months ago

a-s-k-u commented 9 months ago

Description

Current Behavior: Excessive Document Querying on Page Change Upon each page transition, the package is querying not only for the documents on the current page but also for all preceding documents. This leads to an unnecessary increase in total read operations from Firebase.

The Solution: This pull request implements the usage of Firebase Firestore query cursors to exclusively retrieve data pertaining to the current page, optimizing data retrieval. When the page size changes, the package no longer re-fetches from first page.

Related Issues

This PR will resolve Pagination Issue.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process. Updating the pubspec.yaml and changelogs is not required.

Breaking Change

Does your PR require plugin users to manually update their apps to accommodate your change?

rrousselGit commented 9 months ago

The proposal sounds good, but this needs testing :pray:

rorystephenson commented 9 months ago

If I understand correctly this would be a major breaking change as the list would no longer update for changes in previously loaded docs?

According to this StackOverflow response if "you have local caching enabled and/or attach the new snapshot listener before removing the old one" the previously fetched documents would be pulled from the cache rather than being fetched from Firestore again. So if we are careful with when we attach the new listener this package can keep listening to previously fetched documents and avoid extra reads when fetching new ones.

lesnitsky commented 9 months ago

@rrousselGit is any action required on this PR from @a-s-k-u?

rorystephenson commented 9 months ago

I want to re-iterate that this would be, if I have understood the PR correctly, a major breaking change which is not necessary (previously fetched documents would no longer stay updated).

rrousselGit commented 8 months ago

@rrousselGit is any action required on this PR from @a-s-k-u?

@lesnitsky Well tests are failing, so those need to be fixed.

a-s-k-u commented 8 months ago

@rrousselGit , @lesnitsky - I'll fix that. @rorystephenson - by local caching, do you mean offline capability through PersistenceEnabled property ? https://firebase.google.com/docs/firestore/manage-data/enable-offline. My understanding is that even if we have enabled Offline capability, for a new query, it'd still try to read from server than cache unless app is offline. I'll try to confirm that by enabling PersistenceEnabled for Web and inspecting the network tab in chrome. Please let me know if there is any other way to confirm.

rrousselGit commented 8 months ago

Another thing to be careful of is, previously fetched items will no-longer be available with this PR.

Meaning a ListView cannot use QueryBuilder as it is, as to show the latest page, it needs all items up to the current one.

a-s-k-u commented 8 months ago

So, with the current implementation, I enabled offline cache (through this commit ) and below is how the pagination looks like.

image

It's still making cumulative reads on the server, even with local cache enabled. And, now coming back to this PR ( tested with this commit), pagination looks like below

image

It's performing incremental reads from server as expected. From the above tests, I'm inclined to believe that just enabling local cache will not reduce server reads; and underlines the need of employing query cursors to achieve effective pagination.

rorystephenson commented 8 months ago

It would be great if the Firebase team could weigh in on this, the aforementioned StackOverflow answer from a Google Cloud employee indicates that it is possible to subscribe to the same query with a larger limit and receive cached data for the already-fetched items. This would be the optimal solution.

rrousselGit commented 8 months ago

Correct. That is the original reasoning for using this approach of increasing the limit instead of using startAt

a-s-k-u commented 8 months ago

I've created a DarkPad, so that it's easy to test out the above mentioned claim. https://dartpad.dev/?id=ec5d9b700a73f9b67f14073e7f6057bd However, I'm unable to find any real benefit from keeping the connection open and subscribing to another one with a larger, overlapping set. Observing the results in chrome network tab, I can see that the first page is getting queried multiple times even though it's part of overlapping data. Maybe there is a better way to quantify server reads ? Please feel free to edit or suggest modifications to the above dumbed down version of FirebaseListView. Just jotting down a few additional references I could find in StackOverflow regarding the above claim. https://stackoverflow.com/questions/73637044/does-firestore-have-an-in-memory-cache-for-optimistic-updates-separate-from-inde https://stackoverflow.com/questions/38423277/does-firebase-cache-the-data/38423694#38423694 However, I'm afraid these pertain to reuse of same query; and not cumulative queries.

The official documentation does recommend to use query cursors or page tokens for large result set.

image

Maybe it was thing with Real Time Database and not CloudFireStore anymore ?

rrousselGit commented 8 months ago

If multiple reads are performed, we should change the logic, yes.

But no matter what we do, the CI needs to be fixed :)

lesnitsky commented 8 months ago

As long as tests are green locally, I'm good. CI setup is currently very fragile, I'm doing some work to make it more stable.

a-s-k-u commented 8 months ago

@rrousselGit , @lesnitsky - I'll fix those unit tests. Additionally, I've discovered a simpler and more reliable method for measuring server reads using firestore metrics exposed by google cloud api.. To assess server reads, I conducted the following tests with a Firestore collection containing just 100 documents while enabling offline caching for all three scenarios. I left the page size as 10 and scrolled down to the bottom to ensure all documents are read:

a.) Current Implementation - I measured 559 reads. 10 + 20 + 30 + 40 + 50 + 60 + 70 + 80 + 90 + 100 + 9 additional reads ( due to our n+1 document query approach) b.) Current Implementation with querySubscription.cancel() commented out to keep listeners open - I observed the same 559 server reads. c.) This PR - Precisely 100 Reads

Below is the image from Google Cloud API image So, it seems that going with the third approach using query cursors is the way to go.

rrousselGit commented 8 months ago

We would need new tests for this.

In particular, we need an e2e test to verify that adding/removing items in previous "pages" correctly update the current page.

a-s-k-u commented 8 months ago

Well, that would be the tricky part as explained here. Given that it'll listen only to the "current" page, any updates on the "current" page can be handled smoothly. However, it won't detect updates on previous pages. And, having multiple listeners for each page sounds cumbersome. Alternatively, we could let the end user decide how they want to handle this situation. For instance, consider a project where posts are sorted by 'lastModifiedOn.' If any updates are made to the collection, the end user can effortlessly capture them by listening to the most recent document. In the event that such updates occur, they can trigger a refresh on FirebaseListView.

rrousselGit commented 8 months ago

It is something supported today though.

rorystephenson commented 8 months ago

@rrousselGit if I'm understanding right you've confirmed that it should be possible to avoid repeat reads when increasing the limit on the query however the examples that @a-s-k-u has provided seem to show that it is not working. In order to avoid a breaking change we should keep listening to all documents but it sounds like some input is needed from the firestore team on how exactly (ideally with a demonstration) we can increase the limit and avoid re-fetching all previously fetched documents.

rrousselGit commented 8 months ago

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

rorystephenson commented 8 months ago

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

@rrousselGit Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

Then another page is loaded:

So now we have two listeners, one for the first page (currently returning A B C) and one for the second page (D E F). Finally a new document, C', is inserted with an order value of 3. If the second page snapshot uses startAfter with the last item in the first page then we could potentially miss C':

First page, start at beginning limit 3:

Second page, start after order 3 limit 3:

lesnitsky commented 7 months ago

@rrousselGit any feedback on this?

rrousselGit commented 7 months ago

Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

That's indeed a possible concern. Although I'm not sure we can find a solution that satisfies all problems at once here. It'd be great if firestore offered a "startAtIndex" instead of starting at values/documents

a-s-k-u commented 7 months ago

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that. Say, in the above example, for a page size of 3, each listener should listen to 4 docs with one doc overlap - and track docs at both left and right edges for each page.

say, 1st listener listens for 4 docs from A -> A,B,C,D 2nd listener listens for 4 docs from D -> D,E,F,G 3rd listener listens for 4 docs from G -> G,H,I,J

So, just by cross checking the left edge and right edge item of the page every time a page event happens, it'd be possible to tell if the page has expanded or compressed. For eg: if C' gets added at position 4, right edge doc on page01 is no longer D, so, the page size can be adjusted/incremented by 1 until D is reached.

Having said that, this will introduce additional layers of complexity if we are to cover all possible scenarios - say if D gets deleted as well. Then, edges needs to be verified and possibly realigned for every page change.

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

rrousselGit commented 7 months ago

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that

I think that overlap kinda exist when using startAfterDocument where the document is the end of the previous page. I'm not sure a content overlap is necessary

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

I'm not sure. That would certainly decrease the developer experience and default behaviour. I think it'd be valuable to have the most optimal experience out of the box.