firebase / firebase-js-sdk

Firebase Javascript SDK
https://firebase.google.com/docs/web/setup
Other
4.82k stars 885 forks source link

collection().onSnapshot sometimes has wrong "fromCache" boolean #3053

Closed mesqueeb closed 4 years ago

mesqueeb commented 4 years ago

[REQUIRED] Describe your environment

Steps to reproduce:

Relevant Code:

Calling collection.onSnapshot after another channel is already open on a single document in that collection, will sometimes cause the collection().onSnapshot to be executed twice (as expected). However, it's expected that always the first time it will have metadata.fromCache: true and the second time false. In my case, at random it gives me true two times in a row!

I have a code snippet executed on browser load after the automatic logic with Firebase Auth. That code snippet is inconsistently giving me sometimes once metadata.fromCache: true and then false and sometimes true, true.

Prerequisite: initialise without enablePersistence

firebase.initializeApp(FIREBASE_CONFIG)

Code snippet:

const companyPath = `companies/someId`
console.log(`open onSnapshot → `, companyPath)
this.$firestore
  .doc(companyPath)
  .onSnapshot({ includeMetadataChanges: false }, async querySnapshot => {
    console.log(
      `doc(${companyPath}).onSnapshot
querySnapshot.metadata.fromCache → `, querySnapshot.metadata.fromCache
    )
    console.log(`querySnapshot.data() → `, querySnapshot.data())
  })

// wait 5 seconds to make sure it's after the first doc has loaded.
// the bug occurs no matter how many seconds is waited
setTimeout(() => {
  const companiesPath = `companies`
  console.log(`open onSnapshot → `, companiesPath)
  this.$firestore
    .collection(companiesPath)
    .onSnapshot({ includeMetadataChanges: false }, async querySnapshot => {
      // this always gets executed twice.
      // expected: once from cache, once from the server
      console.log(
        `collection(${companiesPath}).onSnapshot
querySnapshot.metadata.fromCache → `, querySnapshot.metadata.fromCache
// this flag is often TRUE two times in a row! While I would always expect TRUE once, FALSE the second time.
      )
      console.log(
        `querySnapshot.metadata.hasPendingWrites → `,
        querySnapshot.metadata.hasPendingWrites
      )
      console.log(`querySnapshot.docChanges() → `, querySnapshot.docChanges())
    })
}, 5000)

Here are the screenshots to prove it. It's about 50-50, each time I refresh it's one of either two cases below:

Screenshot 2020-05-12 at 20 58 57 Screenshot 2020-05-12 at 21 01 48
google-oss-bot commented 4 years ago

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

mesqueeb commented 4 years ago

In case you wonder what happens if I set includeMetadataChanges: true, still 50-50, not much difference:

case 1

Screenshot 2020-05-12 at 21 25 30

case 2

Screenshot 2020-05-12 at 21 25 55

it's either one of the two above, every time I refresh

wu-hui commented 4 years ago

Hi @mesqueeb

Thank you for reporting the issue! The indeed does not seem right. The only time when you receive two snapshots both with fromCache: true is when there is also a transformation happening.

One thing I'd like to know more though, you mentioned Auth in your description: "I have a code snippet executed on browser load after the automatic logic with Firebase Auth".

I am wondering exactly how this is done. In particular, is it possible the Auth sign in happened after the snapshot listen, firestore would restart the query and re-issue updates. Can you confirm that the snippet is run after you are successfully logged in?

Thanks

mesqueeb commented 4 years ago

@wu-hui Thank you very much to look into this. This problem is very important for me, because my application watches for the first time it gives me fromCache: false, which is sometimes never in case of this bug...

I looked into what you asked but, Firebase Auth was completely unrelated.

I went ahead and prepared a test for you, which you can run in node and reproduces the problem:

https://github.com/vue-sync/firestore/blob/master/test/firebug.ts (only 39 lines!)

You can execute this test by doing:

git clone https://github.com/vue-sync/firestore/
npm i
npm run test test/firebug.ts

Repeat the test command about 10 times, and you will see that about 50% of the time you will get case 1, and 50% of the time you will get case 2:

case 1: (expected)

Screenshot 2020-05-13 at 6 46 22

case 2: (incorrect)

Screenshot 2020-05-13 at 6 46 54

PS: I have no idea why document keys change order all the time, this is something I don't rely on, but I had considered taking into account at some point. Not sure if it's related to this problem. Even if unrelated, would it be possible for you to also elaborate on why the object keys change order, or is this better asked on S.O.?

wu-hui commented 4 years ago

Hi @mesqueeb

Thank you for the reproduction!

A short answer I have, is if you really want to get a fromCache=false update, you should always have includeMetadata set to true.

The reason is, fromCache is partly controlled by backend as well, and it could tell the SDK before or after it raises snapshots. If includeMetadata is set to false, and the SDK already raised a snapshot with all document changes, a change of fromCache from false to true will not raise a snapshot, because it is metadata.

Even though the SDK is in a good state, users of our SDK cannot observe it without includeMetadata=true.

Regarding you key orders, I think you generally cannot rely on results from Object.keys because essentially every JS object is a map. But yeah, you would probably get better explained answer from SO.

Thanks.

mesqueeb commented 4 years ago

@wu-hui thank you for your answer. I don't understand how I can know which docs come from the server, even if I set includeMetadata: true, because as I reported before, there are cases that it will give me an empty array, and give the actual data it received from the server as fromCache: false for some reason:

81691149-52e60400-9497-11ea-9fa4-c201220ab0cf

Can you give me any advice on how to debug this?

I guess my question is, how will I ever know if data is coming from the server when the SDK is telling me it's fromCache: true on those first 30 docs in that pictures. I feel this is a bug....

wu-hui commented 4 years ago

May I ask this: why is it important to know if a snapshot is directly from server?

Isn't the important thing to know is that that all docs in the snapshot are in-sync from the server? When you see fromCache: false, it is basically telling you the last snapshots you seen are in sync with the server. The fromCache update might itself be empty/metadata-only, but the latest snapshots should represent whatever the server has.

mesqueeb commented 4 years ago

Hi @wu-hui Thanks for asking your follow up question!

The reason I want to know if a snapshot is from the server or not, is because I manage a local "state" that is populated by both firestore (from the server data) and the user, when making changes.

When the user makes a change to the local state, I want to ignore if the firestore onSnapshot listener triggers. So I only listen to the fromCache: false calls. Basically: "only update my local state if something changed on the server."

But if I do so, I will sometimes have this situation explained above, where I do get the initial documents from the server, but it was telling me they came from the cache! So isn't this a bug?

How can I detect what trigger was the "initial documents trigger" if I never know if it will be fromCache: true or fromCache: false ?

wu-hui commented 4 years ago

If I understand correctly, you want to know when you have a snapshot that is a result of server committed changes, without any results from local writes not accepted by server yet. You only want to react to changes accepted by server.

Unfortunately, we often cannot provide that in one single snapshot. What you can do, is accumulate the changes from a snapshot listener, and react when you know the snapshots are in an acceptable state (whatever that means for you).

So maybe you could do this: listen with includeMetadata: true, accumulate snapshots into a document array, and when you see snapshots with fromCache: false and hasPendingWrites: false(not necessarily from one snapshot), you know the accumulated document array has only accepted changes, and the result is complete. Then you can react to the document array you built.

An example of accumulation: https://github.com/firebase/firebase-js-sdk/blob/master/packages/firestore/test/unit/api/document_change.test.ts#L58

I know this is a little confusing, because fromCache might not be the best name to use here. But because our snapshots are incremental, and if you really care about having a consistent and complete view of your result, you have to accumulate the snapshots yourself.

I hope I did a better job explaining this time, but let me know if you still find it confusing.

mesqueeb commented 4 years ago

@wu-hui I understand, and thank you for this information. I have one followup question though.

Basically to retrieve all docs initially from the server, I add includeMetadata: true and then check like so:

const gotAllServerData = fromCache == false && hasPendingWrites == false

then after I got all server data, I can only listen to the fromCache == false calls and ignore any triggers with fromCache == true || hasPendingWrites == true.

This seems like a good strategy, however, I wonder what will happen when the onSnapshot listener is opened in eg. a Capacitor/Electron environment and the client is actually offline.

In this case the the strategy above will always keep listening for local writes until the device comes online again, and I'm back to my duplicate update problem.

My questions are:

  1. is it safe to say I could opt to not care about fromCache at all, and only differentiate between hasPendingWrites true or false, to know if an update was made locally or not?
  2. is there a way I can replicate what will happen when onSnapshot is called in an offline environment? I could not find much on utils for testing firestore js sdk, and as you've seen in the repo I shared, I just directly have my node tests connect to firestore with the js sdk and run tests like that now. But I don't know how I could make it so I can test for what would happen when the client is offline (besides, manually switching off my own internet...)
wu-hui commented 4 years ago
  1. Yes, that should work.

  2. The closest thing is disableNetwork: https://firebase.google.com/docs/reference/js/firebase.firestore.Firestore#disablenetwork. I think this should suffice your need for automatic testing.

louisameline commented 4 years ago

Hi @wu-hui, I'm working with @mesqueeb on the Vuex-easy-firestore library and I have a complementary question about this.

Does hasPendingWrites === true on a snapshot always mean that its included changes come only from local?

For example, let's imagine that I make a write request to a document, and that it takes 5 seconds to perform. If I get remote changes to that document from another client during these 5 seconds: will the incoming snapshot be tagged with hasPendingWrites === false?

My understanding from the doc is that in that case, the snapshot document will contain our not-yet-remotely-saved modifications, and so we should have hasPendingWrites === true

Right now, Mesqueeb is asking why we'd use fromCache at all if we can just write

// assuming `includeMetadataChanges` is `true`
if (docSnapshot.metadata.hasPendingWrites) {
  // this snapshot is only about changes that have been made locally,
  // we can safely ignore it completely
}

But I don't think we can?

wu-hui commented 4 years ago
if (docSnapshot.metadata.hasPendingWrites) {
  // Skip
}

skips any snapshots received before your local change is committed. Skipping might be a bad thing to do though, because you need accumulate to get the full result. You can skip reacting to that snapshot though.

If all you care about is not reacting to snapshots before your local change is committed, then hasPendingWrites should be good. Remember, you still need to accumulate snapshots.

louisameline commented 4 years ago

Thank you for the answer, it's clear.

  1. We definitely don't want to skip snapshots as we don't want to miss changes triggered by remote clients
  2. I don't think we need to accumulate, as our goal is not to know when something has finished, but rather to make sure we save every remote modifications into the local store (without triggering handlers unnecessarily about local changes)

I'm not sure why anyone would try to accumulate on their own though, isn't better to rely on the promise returned by Firestore? When it takes several snapshots to complete a request, I guess the SDK does this work of accumulation in order to resolve the promise at the right time? That was our assumption so far.

Sorry for all the questions, but we'll be able to build a robust connector library if we get things right ;)

wu-hui commented 4 years ago

I think you are right, you don't necessarily need to accumulate yourself if you don't care about how the snapshot changed from last snapshot.

If I understand correctly, what you want is a way to tell: every local write that might affect the query result has been accepted by the backend. If that is right, then using hasPendingWrites should be enough.

Note that fromCache might still be true in this case, which means backend knows there are more data it needs to send to the client, in other words, client data might be stale (missing data from other remote clients that server knows about, for example), but all local writes have been committed at least.

louisameline commented 4 years ago

what you want is a way to tell: every local write that might affect the query result has been accepted by the backend

No actually it's not really about our writes, it's more about getting the remote updates.

It will be simpler if I say it like this: when the user calls our "changeData" method, we update the local store first (Vuex), THEN we send the write request to Firestore. So in fact, we're actually trying to ignore the local notifications, to act only on the notifications of remote changes.

When you say

Note that fromCache might still be true in this case

Interesting, so fromCache === true is also a way for Firestore to say "I'm not done synchronizing everything with you"... This metadata makes sense on a querySnapshot, but could a documentSnapshot also contain remote changes AND have fromCache === true? I guess not, right?

If yes, then I fear that neither fromCache or hasPendingWrites will be able to help us. We'll have to stop updating Vuex before the calls to Firestore, and update the local store only from the snapshot handler. This design has its own disadvantages so we avoided it so far, but if there is no other way...

louisameline commented 4 years ago

If you could please answer my last question, as it's decisive on our architecture. Thank you very much