firebase / firebase-js-sdk

Firebase Javascript SDK
https://firebase.google.com/docs/web/setup
Other
4.78k stars 876 forks source link

Hanging query for Firestore #7860

Open thomasdao opened 7 months ago

thomasdao commented 7 months ago

Operating System

Both Mac and Windows

Browser Version

Chrome, Electron Browser window

Firebase SDK Version

10.7.0, 10.7.1

Firebase SDK Product:

Firestore

Describe your project's tooling

Plain Electron app

Describe the problem

This is the new ticket for hanging query issue, follow up from https://github.com/firebase/firebase-js-sdk/pull/7771 and https://github.com/firebase/firebase-js-sdk/issues/7652

When update Firebase to 10.7.0 and 10.7.1, the query becomes a lot slower and frequently stuck with error below:

@firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x5b9a037f transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}

Switch back to 10.6.0 and the query completes quickly.

Steps and code to reproduce issue

I've created a minimal sample to reproduce this issue and have shared with @MarkDuckworth, if you need to get access to the private repo, please let me know, thank you!

ehsannas commented 7 months ago

Thanks for reporting @thomasdao. I'll try to reproduce it

thomasdao commented 7 months ago

@ehsannas thanks, I've invited you to the sample project :)

ehsannas commented 7 months ago

Thanks @thomasdao . I am able to see the error in the logs from your repo. I do, however, see that each such log message is followed by an UNAVAILABLE code from the backend. Which means it's a legitimate error returned from the backend to the SDK. It's plausible that the newer WebChannel version has become much more efficient at sending parallel requests to the backend such that you're hitting a certain limit of request rate for a single client. This error code is retryable with a backoff, which means the SDK will recover and rerun the query after some delay.

Please take a look at: https://firebase.google.com/docs/firestore/real-time_queries_at_scale#understand_high_write_traffic_in_the_system https://firebase.google.com/docs/firestore/best-practices#ramping_up_traffic

thomasdao commented 7 months ago

@ehsannas I've never seen the UNAVAILABLE code, even if I wait for more than 10 minutes.

I find the reason newer WebChannel version has become much more efficient at sending parallel requests not really logical: the same type of query works with version 10.6.0, which indicates that the server is able to handle that query and the problem is likely with the newer version of the client.

I've tested adding a delay of 1 second between each paginated query to reduce server load, and see the same error @firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}.

phileasthefogg commented 5 months ago

I'm also running into this error. Subscription seems to work fine for a while and then gets dropped with the same RPC 'Listen' stream transport error. Any ideas on what this might be or where to catch the error?

IvanKYW commented 5 months ago

Same issue after upgrade AngularFire to 17.0.1 which depends on firebase ^10.7.0.

One of our project query becomes slower and run into the @firebase/firestore: Firestore (10.7.2): WebChannelConnection RPC 'Listen' stream error occasionally. The other smaller project works fine.

Tried experimentalForceLongPolling mentioned in #7968 but no luck. downgrade to 10.6.0 seems resolve the issue.

ghinda commented 5 months ago

I'm also seeing the same issue with hanging snapshot queries for a while, with the same type of WebChannelConnection RPC 'Listen' stream ... transport error.

Sometimes, after failing with the error, the snapshot query retries and returns correct data after a couple of minutes, but most times it just hangs indefinitely. In our case, it only happens with queries that would return a large amount of data (hundreds of docs containing fairly large strings).

The issues started with versions 10.4.x. They were then fixed in versions 10.6.x, but are now back again with 10.7.x. I've also tested the latest 10.8.0, and the issue is still there. As a summary:

Using experimentalForceLongPolling does not seem to make a difference.

I wasn't able to reproduce it in a local or staging environment, as it only seems to show up in our production environment where we have around ~40K snapshot listeners / ~10K active connections, as reported in the Firebase console.

MrDavidRios commented 5 months ago

I'm also running into this error since upgrading to v10.7.0, and much like @phileasthefogg, getting the same RPC 'listen' stream transport error. This is a small project (< 10 active connections at a time), and I'm able to reproduce it in both local and production environments.

thomasdao commented 4 months ago

Hi @ehsannas, not sure if you have been able to work on this issue? Maybe @MarkDuckworth can take a look. This issue has prevented us from updating to the latest version. Thank you!

alex-dokienko commented 4 months ago

same issue happens for my project (using flutter), in the beginning everything was fine (I've being using firestore for about 6months) but now suddenly getting all the time (maybe data sets grown, due to smaller db size didn't experience it before)

ehsannas commented 4 months ago

@MrDavidRios Would you be able to share your project in which you're able to consistently reproduce this issue? (feel free to point me to a github repo). Thanks!

hiroro-work commented 4 months ago

This phenomenon seems to be more likely to occur in a slow network environment. By setting "Fast 3G" or "Slow 3G" in Network of DevTools, we were able to reproduce the phenomenon even in an environment where it does not usually occur.

dconeybe commented 4 months ago

(note to googlers: this may be related to support case b/325591749, which reports similar webchannel issues when the network is throttled)

jorgsiegel commented 3 months ago

Same thing happens in our project. Unfortunately I can't downgrade to firebase 10.6.0 (without much effort) because of AngularFire and Angular dependencies. It still happens on firebase 10.9.0 ...

thomasdao commented 3 months ago

This issue happened since December last year, affect multiple project but did not receive any update. I'm on Blaze plan but cannot update the library to the latest version and it's really frustrating. Could you please share if any of you are investigating this issue? Thank you! @MarkDuckworth @dconeybe @ehsannas

MarkDuckworth commented 3 months ago

@thomasdao, I'll touch base with the team and see if I can move this forward.

jorgsiegel commented 3 months ago

This problem affects users in our production apps. We are also in the middle of developing a new app and can consistently reproduce the error. It seems to be connected to the size of Firestore documents. Our documents are max. 300,000 bytes, which is far below the limit specified on the official Firestore documentation page (1 MiB / 1,048,576 bytes) and we are fetching max. 40 documents in a single query.

We would highly appreciate if the Firebase team could check what changed in recent versions and fix it soon.

Valansch commented 3 months ago

Thank you @MarkDuckworth.

Just to second @thomasdao & @jorgsiegel, this has long been a part of the stable releases and effects our users. For various reasons we are unable to downgrade. We have a long living gcp ticket open regarding this. I have a feeling this happens more often the bigger the result set is. We run an SPA, where we stream about 5000 documents. All well in the region of 1KB. When the queries fail they restart over and over. Resulting in the client downloading 100MB what should be 5MB. We have no workaround for this.

Would really appreciate to see some progress here.

valeriangalliat commented 3 months ago

We're also encountering this issue (running 10.8)

Tried 10.11 and it's still happening, but as suggested above downgrading to 10.6 fixed it

dconeybe commented 3 months ago

I have a potential fix for this issue. Would anyone be willing/able to test it out? The fix is in https://github.com/firebase/firebase-js-sdk/pull/8145 (NOTE: it is still a work-in-progress). Please comment on the PR with the outcome of your experiment (rather than commenting here on the issue).

You will need to build the firestore sdk for yourself, but, thankfully, it's relatively straight forward.

  1. npm install -g yarn
  2. git clone --depth 100 https://github.com/firebase/firebase-js-sdk.git (if using an existing clone of this repo, make sure you're at a commit that includes #8145) ~git clone -b dconeybe/WebChannelOnOpenFix_Bug325591749 --depth 100 https://github.com/firebase/firebase-js-sdk.git~
  3. cd firebase-js-sdk
  4. yarn
  5. yarn build
  6. cd packages/firestore
  7. yarn build:debug
  8. cp -r dist ~/YOUR_PROJECT/node_modules/@firebase/firestore
  9. rebuild your project and test it out

Note that the --depth 100 argument to git is just an optimization to pull about 8MB instead of 30MB. Feel free to omit that argument.

Note that the extra yarn build:debug command is optional, and produces Firestore's index.esm2017.js with all of the code mangling, code stripping, and optimizations disabled. This will produce more readable compiled code and stack traces without mangled names that are much easier to make sense of.

The "cp" command will copy the compiled Firestore JavaScript bundles into your own project's node_modules directory, clobbering the ones that npm downloaded. Make sure to restore the production version (e.g. by deleting the node_modules directory and re-running npm install) when done testing out this fix.

MarkDuckworth commented 2 months ago

@thomasdao, I have a branch (markduckworth/debug-webchannel-stat-events) that will log additional events from WebChannel. This logging is showing some useful additional info before a WebChannelConnnection transport error on my device.

Can you test with this branch on your local reproduction and provide me with any log statements for "STAT_EVENT". If these events are before the WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored event, please include those log lines too.

Your help is greatly appreciated.

thomasdao commented 2 months ago

@MarkDuckworth I check out your branch and follow the instruction from https://github.com/firebase/firebase-js-sdk/issues/7860#issuecomment-2052471034. Please see the log attached, thanks!

firebase_log.txt

MarkDuckworth commented 2 months ago

Thanks @thomasdao.

In my local tests, when I see WebChannelConnection RPC 'Listen' stream X transport errored: ..., the STAT_EVENT logging shows that the root cause was expected/normal. Furthermore I saw the SDK recover gracefully.

In your logs, the STAT_EVENTs leading up to the WebChannelConnection error are different. I'm trying to understand why. The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

MarkDuckworth commented 2 months ago

Also @thomasdao, can you provide the Firebase project ID you used when creating firebase_log.txt? Is it the same project ID from your shared repro? We want to review server logs.

thomasdao commented 2 months ago

@MarkDuckworth

The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

Yes, I can still reproduce this issue. Sometimes the query can complete, but the next time I run it again, the query would hang.

Is it the same project ID from your shared repro?

Yes it's the same project ID.

MarkDuckworth commented 2 months ago

Version 10.11.1 was released today and rolls back the WebChannel config to be equivalent to the 10.6 (and 10.5.2) releases. I have tested with @thomasdao's reproduction and I'm seeing the queries complete consistently and quickly. Errors WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1} were not observed.

thomasdao commented 2 months ago

@MarkDuckworth thank you, I tried 10.11.1 and found the query can complete quickly.

Just curious, is WebChannel really superior to the FetchXmlHttpFactory? What's the problem with FetchXmlHttpFactory?

IslamElKassas commented 2 months ago

Friends, It is already fixed by firebase team in the newest Version 10.11.1 - April 25, 2024

Cloud Firestore Prevent spurious "Backend didn't respond within 10 seconds" errors when network is in fact responding, but slowly. See GitHub PR #8145. https://firebase.google.com/support/release-notes/js