RFC: Usage of experimentalForceLongPolling option.

mikelehen commented 5 years ago

Are you experiencing the error Could not reach Cloud Firestore backend. Backend didn't respond within 10 seconds despite your network seeming healthy? Please visit https://debug-my.firebaseapp.com/ and check the results at the end. If the tests "with default options" fail (or are very slow) but the "with forceLongPolling" ones succeed, then this indicates your traffic is likely going through a proxy that is buffering responses in a way that is not compatible with Firestore.

As a workaround, you can force long-polling as follows:

firebase.firestore().settings({ experimentalForceLongPolling: true });

Have you enabled `experimentalForceLongPolling` and `experimentalAutoDetectLongPolling` to solve a reproducible connection issue related to a specific environment?

We would like to know:

What environment causes the problem (app platform, antivirus software, network proxy, network conditions, etc.)?
What is the behavior without experimentalForceLongPolling or experimentalAutoDetectLongPolling?
Does experimentalAutoDetectLongPolling completely resolve the issue?
Does experimentalForceLongPolling completely resolve the issue?
Please visit https://debug-my.firebaseapp.com/, wait for "All tests done" (about 60 seconds), paste the results into a gist, and paste the link in your comment.

If there is an existing comment concerning the environment you are using, feel free to just add a 👍 to it.

tgangso commented 5 years ago

Hi,

I was going to enable this feature because it describes an issue that sounds similar to what many of my users are describing when using my app on corporate networks.

But from my testing this results in an absurd amount of network requests when querying large (5- 10000+ documents) collections (using "@angular/fire": "5.1.2" + "firebase": "5.10.0", with enablePersistence), it looks like it gets one channel request for every document in the collection, causing very long loading times waiting for the result. And I found it unusable in its current state and have deactivated it again. After deactivating I am back to normal amount of network requests (1 for the collection).

Hope it can be improved as many users report issues that sounds like this feature would fix.

mikelehen commented 5 years ago

@tgangso Thanks for the feedback! (and sorry it didn't work out for you). We'll discuss to see if we can do anything about the performance. In the meantime, were you able to determine if this solved any issues for your users? And if so, can you comment more on the corporate network environment that was causing issues? Thanks.

tgangso commented 5 years ago

@mikelehen I have not been able to reproduce the issue myself, but generally users on Equinor/Statoil corporate network experience the issue, also some users with adblocker (VPN based i think) have reported this.

I never deployed the setting to production because of the performance issues, so I am unable to confirm if it helps or not, but I have some users that can help testing when/if performance is improved.

tgangso commented 5 years ago

I did end up adding an option for the user to enable this manually, and have asked a few users to try it out, and it did indeed help with the issue, and they where able to use the application.

So I hope this option is not removed, as it will help even if performance is lower. Not all my users have huge amount of documents.

One user had issues on his Mac, same network as his phone (where it works). gist.

One user is on a "high security" corporate network, gist.

Behaviour without experimentalForceLongPolling, is no data is retrieved or being sent to firestore, but works when enabled.

Hope it helps.

Maybe it could be possible for firebase-js to detect when this is needed as a fallback?

tgangso commented 5 years ago

Added a gist for the mac scenario also.

Let me know if you want me to test anything else.

mikelehen commented 5 years ago

@tgangso Thanks very much for the data. We'll plan to keep the option around unless the problem stops happening or we devices a better solution (like automatic fallback).

tgangso commented 5 years ago

@mikelehen great, I can confirm a few others have tried the setting with success. Automatic fallback would be great if possible.

mikelehen commented 5 years ago

@tgangso FYI- There was a backend change that should cut down on the number of network requests when using this option. Would you be interested in trying it again and reporting on whether it behaves better now?

tgangso commented 5 years ago

@mikelehen i did some quick testing now, and it definately feels better, looks like there are more documents per request now. Will keep the setting on for a bit and test.

rodrigoreis22 commented 5 years ago

We're looking into adding this as a fallback for our web application. Some small percentage of users get the message Could not reach firestore backend and we're using the latest sdk version.

Is it possible to enable this setting after the app is already initialized only in case of a failure reaching the firestore backend? Ideally I want to enable this only on a catch flow .

mikelehen commented 5 years ago

@rodrigoreis22 Not super easily. But you can call firebase.app().delete() and then call firebase.initializeApp(...) again, and then firebase.firestore() will return a brand new instance which you can call .settings({experimentalForceLongPolling: true}) on... but note that any existing writes / listeners / etc. will need to be reestablished.

dubvi5 commented 5 years ago

Hi. I cannot connect to firestore without experimentalForceLongPolling.

Windows, on a corporate network with McAfee endpoint security.
On first load I get "Could not reach Cloud Firestore backend. Backend didn't respond within 10 seconds.", but it works if I reload the page.
experimentalForceLongPolling seems to completely resolve the issue.
https://gist.github.com/dubvi5/1731f17940531d94451aeded761d08ad

Thanks.

dcc82 commented 5 years ago

Hello, we have a Firestore connection issue with one of our computers at work. The error message says "Firestore (6.3.0): Could not reach Cloud Firestore backend. Backend didn't respond within 10 seconds..."

This issue doesn't occur on the other computers within the same network. Enabling ExperimentalForceLongPolling seems to resolve the issue.

The problematic one is a Windows PC with "Symantec Endpoint Protection" installed. The others are all Mac computers without antivirus.

Here's the gist: https://gist.github.com/dcc82/270cc8eac2f0fe312aa4ac8befad7925

Thanks.

bhar4t commented 5 years ago

Hi

I enabled the feature and I tried feature enabled project with online, mid-range mobile and low-end mobile network options of chrome debugger.

I still getting error message as "Firestore (6.3.0): Could not reach Cloud Firestore backend. Backend didn't respond within 10 seconds..." in only on low-end mobile. except both is working fine in the both condition whether with feature enabled or not.

mikelehen commented 5 years ago

@bhar4t Thanks for the report. experimentalForceLongPolling is meant to work around a specific "broken" network configuration that's unrelated to your connection speed. So I wouldn't expect it to have an effect when testing in Chrome Debugger regardless of your emulated connection speed (low-end mobile, etc.).

Low-end mobile is likely hitting that log message just because it's taking longer than 10 seconds, but you should still get results if you wait long enough... it'll just be slower due to the low connection speed. experimentalForceLongPolling won't help.

abrice commented 5 years ago

Hi,

Firebase Auth works in all environments (with and without long polling).
Firebase Firestore works in localhost but fails completely in corporate environment with Win10/WinDefender/Bluecoat layers. The corporate environment is a very standard.

With experimentalForceLongPolling set to true, the corporate environment works beautifully with excellent performance and responsiveness. The gist is here: https://gist.github.com/abrice/de6984ffb7361cf4c071d574e49eb552.

Note, we only use the web (no mobile).

So, the issue then becomes, is 'experimental' going to become 'production'? It seems very risky to proceed with a solution that may or may not be withdrawn at any time.

The Firebase Auth/Firestore solution is truely impressive and ForceLongPolling works! Please move it to production status.

mikelehen commented 5 years ago

@abrice Thanks for the feedback!

The high-level answer is:

We aren't currently planning to make experimentalForceLongPolling the default behavior because it has some performance drawbacks and it should not be necessary for the vast majority of users (but we're using this github issue to collect / monitor feedback in that regard).
That said, we have no intention to remove experimentalForceLongPolling until / unless we have a better option for any users that are encountering the connection issues it currently solves.

For your specific case, can you elaborate on "Bluecoat layers"? I guess Blue Coat Systems was acquired by Symantec and perhaps this is the current version of their proxy software: https://www.symantec.com/products/proxy-sg-and-advanced-secure-gateway ?

It looks like it is known to cause problems do to SSL interception / proxying, e.g. with Windows Update and Office 365. So I'm not at all surprised it causes issues for Firestore as well, although interestingly this is the first report I've seen.

I don't suppose you or your IT department would be interested in engaging with Symantec on this issue and see if it can be resolved? I'd be happy to participate in the discussion and provide technical details, etc. If you're interested, feel free to reach out / CC me at michael@firebase.com. Thanks!

tgangso commented 5 years ago

Any update on if it is possible to make it fall back automatically to forceLongPolling?

I agree there are performance drawbacks, and should not be the default option.

I get a few questions every week from users experiencing problems (mostly on corporate networks), and enabling this in the settings of my app always resolve it.

abrice commented 5 years ago

Hi @mikelehen ,

The corporate runs O365 and has dealt with the challenges of Windows Updates. The network side is complex as there are a few other parties involved that provision various levels of proxy servers, anti-virus, load balancing, etc as-a-service. There's quite a lot of complex traffic moving about so, unfortunately, I can't help any further on pursuing a more in-depth technical examination.

But it's excellent news about your approach to experimentalForceLongPolling. Thank you!

mikelehen commented 5 years ago

@tgangso Adding a fallback mechanism is still on our radar but not something we're actively pursuing. Unfortunately it would be pretty complicated to add since it's hard to differentiate this case from being offline or even just on a slow network, so it's not obvious when to employ the fallback, etc. Right now we're still gathering feedback on the scope of the problem. Thanks!

DJ-Shady02 commented 5 years ago

Hi @mikelehen

I have been seeking help from Firebase Support, and after one and a half week we came up with changing the firestore settings to experimentalForceLongPolling:true. With this setting, I was able to reduce loading times (when reading through .get()) from 50-60 seconds to 137 milliseconds. Before enabling forceLongPolling I would get a error when trying to add anything due to timeout.

When using any website requesting data from Cloud Firestore, I am unable to use the site. On my own however, I enable the setting simply for me to be able to actually use it. I have been working on this from two different computers, and tried on to different networks as well. Furthermore, I have tried disabling anything from proxies to firewalls - even Windows Defender Firewall. Nothing helped except enabling forceLongPolling.

mikelehen commented 5 years ago

@DJ-Shady02 Thank you for the information! Usually this issue only manifests in the face of specific corporate proxy or antivirus software, etc., so the fact that you're seeing it on two different computers and two different networks, even with disabling proxies / firewalls is unusual. I would still guess that there's some common factor that's causing it to happen, but it's hard to know what it might be. If by chance you're able to collect any more information that helps isolate the root cause (by trying from yet more devices including phones or tablets) or more networks (coffee shop, etc.), let us know. Thanks!

remyayad commented 4 years ago

Hello Michael, I am facing an issue with my angular app not loading data from Firestore in a timely manner. The results to your test are displayed in the following gist : https://gist.github.com/remyayad/db36b4434d0f8628f65d236276e9c754

This happens only the network of the company I work for. Unfortunately the entire IT infrastructure is managed by a third-party company, therefore I cannot determine what ports are being blocked, if proxies are used, firewall security policies, etc. The anti-virus software is Symantec Endpoint Protection and it seems as though there is a web filter probably implemented via a proxy to filter web pages we go on. It's called zScaler. The angular app displays just fine, but the content that is stored in Firestore doesn't appear. If I let the computer there for an hour or two, I can see that the page I was on has the data from Firestore loaded (God knows how long it took for the data to appear). I enabled the experimentalForceLongPolling option for Firebase, but seeing the results of the debug utility, I certainly must be doing something wrong...

2020-01-06_17-38-01 2020-01-06_17-36-33

mikelehen commented 4 years ago

@remyayad Hrm. Your results aren't what I would expect to see with the "normal" proxy issue. It looks like the test page worked for you both with and without forceLongPolling. But it sounds like in your Angular app, Firestore isn't working (well) regardless of whether you enable experimentalForceLongPolling.

Can you open a separate github issue so we can investigate? It would be helpful if you could reproduce the issue in a fairly minimal app (remove Angular, etc. if possible) and then capture a Firestore log (add firebase.firestore.setLogLevel('debug') to your app) and a HAR file (load the page with chrome dev tools Network tab open, wait ~90 seconds, then right click and choose "Save all as HAR with content").

remyayad commented 4 years ago

@mikelehen Thanks for the quick reply! Will do and I'll keep you posted.

DJ-Shady02 commented 4 years ago

@mikelehen Finally, I have an update!

I have bought myself a new computer. Used Firestore for a month without issues. I install Bullguard, bam! Same issue as before. Being assured the issue was created by Bullguard, I contacted their support. It appears their "Safe Surfing" (The function name is translated from my native language, so it may vary), is causing the issue. Disabling it will trade a warning notification on sketchy websites with fully functional firestore without using forceLongPolling - worth it!

While this is not directly helpful to you on the topic of the experimentalForceLongPolling option, it can help others avoid it.

wilhuff commented 4 years ago

@DJ-Shady02 Thanks for the report!

We're fairly certain that this problem isn't going away and are working on building logic into the SDK that will be able to automatically fall back on long polling.

tgangso commented 4 years ago

working on building logic into the SDK that will be able to automatically fall back on long polling.

Great news, thank you.

prescottprue commented 4 years ago

Thanks a bunch for making this setting - it seems to solve issues with Firestore loading in Cypress and has been proposed as a possible solution to a note about bad performance of Firestore in Cypress.

wilhuff commented 4 years ago

@Kretin1 This issue is not a general discussion forum--we're collecting feedback on specific circumstances in which experimentalForceLongPolling has helped or hurt.

It sounds like you may have some unique concerns so please file an issue describing exactly which components you're using and what you're trying to do.

wilhuff commented 4 years ago

@prescottprue Interesting! I'm glad this helps, though as I mentioned in https://github.com/firebase/firebase-js-sdk/issues/1674#issuecomment-582489380, I'm hopeful that we'll be able to auto-detect this condition without much of a latency penalty. Once we can prove that it works in general, we'll remove this setting.

mhop1 commented 4 years ago

Had this problem with new PC: Windows, BullGuard, regular ISP.

BEFORE experimentalForceLongPolling, App takes 1-2 minutes to respond. AFTER implemented it returns immediately. So solves for web app.

Still have problems with the Firebase console as that is via browser - so (presumably) is still being affected by BullGuard?

Gist: https://gist.github.com/mhop1/e4f8ddb60c3631052a13bc905deedd4f

njvb commented 4 years ago

My issue occurs when building an iOS app with Cordova. All queries work fine in a browser, the build require the long poll solution to function.

khmy2010 commented 4 years ago

how to use with with AngularFire?

tuanngominh commented 4 years ago

@khmy2010 check this https://github.com/firebase/firebase-js-sdk/issues/2526#issuecomment-573820058

matej-svejda commented 4 years ago

I had one of our corporate clients who wasn't able to connect without this option, but it works great with long polling. They say they have "firewalls and proxies" setup, but I dont have any more details.

Now Im thinking about enabling this by default, since its hard to know beforehand if proxies or firewalls are present.

Is there an ETA for when this auto-fallback option will be enabled? And could you give some more insight into the performance drawbacks of using this option? I saw in another thread that it adds 0.5 RTT. Does this mean that incoming snapshts and confirmations of writes are 0.5 RTT delayed and all else is the same (like for example the time it takes for updates to be registered locally or the load time per document etc).?

TooManyJohns commented 4 years ago

So this was super bizarre for me, new to using Google Cloud Firestore, app was working fine on iOS on my laptop. When I switched to android it didn't work and encountered this identical issue on both my laptop, and my PC. Enabling this workaround worked, so thank you. Is this 100% of the time for Firewalls/Proxies, I am using a mesh network connection, but otherwise I don't believe anything special. Any additional cases this occurs in? let me know if more logs/info is needed! Happy to provide context.

"firebase": "^7.14.3",
"react": "16.11.0",
"react-native": "0.62.2",

Regards,

Gist: https://gist.github.com/TooManyJohns/6a1d5d9a758907d23ec92dd673a70fad

hutman47 commented 4 years ago

How can I set this option on Android platform (experimentalForceLongPolling=true). I have same issue on Android, but no way to fix it

rafikhan commented 4 years ago

@hutman47 - This option does not exist on Android because the nature of the network transport is very different. The JavaScript SDK is constrained by what's compatible with browsers and HTTP/1.1. All other SDKs make use of HTTP/2 and have no long-polling fallback.

Please open an issue in the android repository with specific details of what you're seeing so we can help you.

catchshyam commented 4 years ago

Hello, We have been testing the application on different machines and firebase would always throw unreachable backend error on one machine only (it started throwing error only for last couple of days). So finally I ended up enabling the long polling flag and that resolved the issue. Here is the gist for debug from the problematic machine. I still don't know the root cause but I will have to stay with this flag until you have a better solution. Is it possible for you to pin point the root cause from the logs? Let me know if you want my application logs, I have the logs with and without the long polling enabled on the said machine.

Also, the doc says there would be performance degradation with the flag enabled. How bad is the degradation? Thanks!

wilhuff commented 4 years ago

@catchshyam in general, on a host that's experiencing the problem, all we see is that the client thinks it's offline because it hasn't gotten a response in time. The cause is that something in between the SDK and its server is buffering the HTTP response until it's finished. Nothing in our logs would show what that cause is.

Common causes are in this thread of discussion: invasive anti-virus software, HTTP proxies, or network hardware, all that's trying to evaluate the whole payload of an HTTP response before allowing any of it through. Firestore's default mechanism for pushing updates from the server sends multiple logical updates in the same HTTP response. What experimentalForceLongPolling does is to restrict the server from sending only a single update per HTTP response. The server then waits until it gets another request from the client to send the next update.

The performance penalty falls out of that description: it's one round trip per update. On fast networks proximate to the Firestore instance, this isn't something you'll notice. On slow networks or configurations traveling a long distance to the server it's more of an issue.

As noted elsewhere, we're working on adding code that auto-detects this condition.

catchshyam commented 4 years ago

@wilhuff thanks a lot for the explanation. After I wrote the comment, the error surfaced on the problematic machine again even with the flag turned on. However, the issue resolved itself mysteriously today even without the flag.

I am a worried man at the moment because I have another application that uses firebase at its heart. I understand that this issue can randomly occur on any terminal and I don't feel so confident anymore about offering my app to corporate clients with the assurance of 99.99% uptime because it can fail on some terminals inspite of having healthy internet connection and I would not have a valid argument to present why my application does not work!

Benny739 commented 4 years ago

Same Problem here, even with the flag enabled sometimes clients can not connect to firestore.

ghost commented 4 years ago

It happens. One way to improve reliability is a service worker that periodically checks connectivity and presents UI feedback so the user can re-connect.

mh7777777 commented 4 years ago

Same problem here,

"react-native": "0.63.2", "firebase": "^7.17.2"

firebase.firestore().settings({ experimentalForceLongPolling: true });

this is working, but it's really slow. I guess I will need replace it with react-native-firebase. If someone fix this, please let us know. I would like to use Firebase Web SDK in my next react native projects.

rafikhan commented 4 years ago

Update from Firebase We've been working on this issue for some time and have released an experimental update that should alleviate and hopefully fix this problem. If you update to 7.24.0 or newer and enable experimentalAutoDetectLongPolling it should be able to detect if long polling is needed and only enable it when required. Please try this out and let us know if this issue is resolved or you run into new issues.

Thank you

njvb commented 4 years ago

Thanks, @rafikhan! This has been a long standing issue, so thanks for the efforts to address this.

tgangso commented 4 years ago

I have used the experimentalAutoDetectLongPolling now for a week, running fine so far, no issues reported from users yet. Had the experimentalForceLongPolling enabled previously.

catchshyam commented 4 years ago

@rafikhan thanks for the update. The easiest way I reproduce the issue and test is with BullGuard antivirus. I just upgraded to firebase 7.24.0 and tested my app. My finding are the following,

With experimentalAutoDetectLongPolling flag on: I no longer see the error message "unable to reach firebase backend" on the browser console. However, the app still takes close to a minute to boot up.

With experimentalForceLongPolling flag on: No errors like the previous flag and app boots up with in a matter of seconds.

So I guess the new flag is in the right direction but still there is scope for further optimization.

Thanks!

wenbozhu commented 4 years ago

@catchshyam Thanks for the report. Could you post the following info?

browser version
instruction to set up BullGuard and the OS you tried

One possible reason is that BullGuard only allows one connection to be made to the same server address and as such the long polling can only start after the initial streaming request gets timed out.

firebase / firebase-js-sdk