firebase / firebase-js-sdk

Firebase Javascript SDK
https://firebase.google.com/docs/web/setup
Other
4.85k stars 893 forks source link

Database data event fired before receiving data from server #7381

Open pkaminski opened 1 year ago

pkaminski commented 1 year ago

Operating System

Node 18 on Debian 12

Browser Version

N/A

Firebase SDK Version

firebase-admin@11.9.0

Firebase SDK Product:

Database

Describe your project's tooling

TypeScript into Docker running on GAE Flex

Describe the problem

In our production app we recently started seeing unexpected null values in response to on('value') listeners that should've been guaranteed to return non-null results. After instrumenting our app and turning on database logging we see clear evidence that the SDK is firing the first data event before data has been returned from the server. Consider the following lines extracted from the log file:

[2023-06-12T20:11:39.650Z] @firebase/database: p:0: Listen called for /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha default
[2023-06-12T20:11:39.650Z] @firebase/database: p:0: Listen on /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha for default
[2023-06-12T20:11:39.650Z] @firebase/database: p:0: {"r":307,"a":"q","b":{"p":"/reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha","h":""}}
[2023-06-12T20:11:39.650Z] @firebase/database: event: /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha:value:null
[2023-06-12T20:11:39.650Z] @firebase/database: event: /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha:value:null
[2023-06-12T20:11:39.782Z] @firebase/database: p:0: handleServerMessage d {"p":"reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha","d":"c37e2a61762e5ce43fe13d181dd7c53d9fd6508a"}
[2023-06-12T20:11:39.783Z] @firebase/database: event: /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha:value:"c37e2a61762e5ce43fe13d181dd7c53d9fd6508a"
[2023-06-12T20:11:39.783Z] @firebase/database: p:0: from server: {"r":307,"b":{"s":"ok","d":{}}}

You can observe the client emitting request 307 for /reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha at 39.650Z, then immediately firing an event with a null value for that path. (This was the first listener on this path or any ancestor for this run, so the SDK couldn't possibly know the value yet.) At 39.782Z the server responds with the actual value and the client fires another event. This is not the case for all listeners: most of them correctly wait for the server to return the data first.

We have not been able to reproduce this problem outside of production but it happens regularly there. The pattern is odd, though: the issue appears to only occur at server startup, with a ~60% chance that the server will be emitting bad events for some listeners -- the other 40% of the time there are no anomalies. For server runs that do emit bad data the issue appears to fix itself after anywhere from 4 to 10 minutes and stays fixed for as long as the server is running. We've been unable to characterize the difference in conditions or timing between "good" and "bad" runs thus far.

We're not completely certain when this issue started happening as the errors were often masked in our app due to some overly tolerant legacy error-handling semantics. It's possible, though, that it began when we added some orderByChild queries on /reviews recently, such as this one from the log:

[2023-06-12T20:11:39.507Z] @firebase/database: p:0: Listen called for /reviews {"ein":true,"ep":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7","i":"mergeQueueEntry/commitSha","sin":true,"sp":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7"}
[2023-06-12T20:11:39.507Z] @firebase/database: p:0: Listen on /reviews for {"ein":true,"ep":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7","i":"mergeQueueEntry/commitSha","sin":true,"sp":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7"}
[2023-06-12T20:11:39.507Z] @firebase/database: p:0: {"r":275,"a":"q","b":{"p":"/reviews","q":{"sp":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7","sin":true,"ep":"3f64a743aeb4980d7b0c9a47b95c0b24cd8e7fe7","ein":true,"i":"mergeQueueEntry/commitSha"},"t":17,"h":""}}

Unfortunately, we can't back out the new /reviews query to see if it's responsible for the problem as customers started depending on it before we realized that it may be related to this issue.

Steps and code to reproduce issue

Since we don't have a clean repro I'd be happy to work on further instrumenting our prod environment or running experiments therein to gather more data, but I could use some hints on what to look for!

hsubox76 commented 1 year ago

The user is using firebase-admin and not the JS SDK, but firebase-admin does wrap the JS SDK's database-compat package (which wraps database), so it seems likely the problem is in the JS SDK database code somewhere. If it turns out after investigation to be something in firebase-admin, we can move it to that repo.