Open pkaminski opened 1 year ago
The user is using firebase-admin and not the JS SDK, but firebase-admin does wrap the JS SDK's database-compat
package (which wraps database
), so it seems likely the problem is in the JS SDK database code somewhere. If it turns out after investigation to be something in firebase-admin, we can move it to that repo.
Operating System
Node 18 on Debian 12
Browser Version
N/A
Firebase SDK Version
firebase-admin@11.9.0
Firebase SDK Product:
Database
Describe your project's tooling
TypeScript into Docker running on GAE Flex
Describe the problem
In our production app we recently started seeing unexpected
null
values in response toon('value')
listeners that should've been guaranteed to return non-null
results. After instrumenting our app and turning on database logging we see clear evidence that the SDK is firing the first data event before data has been returned from the server. Consider the following lines extracted from the log file:You can observe the client emitting request
307
for/reviews/-NWyLXZ4WfcAPgYhfmmv/revisions/r2/commitSha
at39.650Z
, then immediately firing an event with anull
value for that path. (This was the first listener on this path or any ancestor for this run, so the SDK couldn't possibly know the value yet.) At39.782Z
the server responds with the actual value and the client fires another event. This is not the case for all listeners: most of them correctly wait for the server to return the data first.We have not been able to reproduce this problem outside of production but it happens regularly there. The pattern is odd, though: the issue appears to only occur at server startup, with a ~60% chance that the server will be emitting bad events for some listeners -- the other 40% of the time there are no anomalies. For server runs that do emit bad data the issue appears to fix itself after anywhere from 4 to 10 minutes and stays fixed for as long as the server is running. We've been unable to characterize the difference in conditions or timing between "good" and "bad" runs thus far.
We're not completely certain when this issue started happening as the errors were often masked in our app due to some overly tolerant legacy error-handling semantics. It's possible, though, that it began when we added some
orderByChild
queries on/reviews
recently, such as this one from the log:Unfortunately, we can't back out the new
/reviews
query to see if it's responsible for the problem as customers started depending on it before we realized that it may be related to this issue.Steps and code to reproduce issue
Since we don't have a clean repro I'd be happy to work on further instrumenting our prod environment or running experiments therein to gather more data, but I could use some hints on what to look for!