firebase / extensions

Source code for official Firebase extensions
https://firebase.google.com/products/extensions
Apache License 2.0
892 stars 380 forks source link

šŸ› [firestore-bigquery-export] Server returned 502 when mirroring data to BQ #2133

Open boywijnmaalen opened 3 months ago

boywijnmaalen commented 3 months ago

Describe your configuration

Describe the problem

in the last 30 days I can find 4 occurrences of this error:

Error when mirroring data to BigQuery FirebaseFunctionsError: Unexpected response with status: 502

image

on/around these timestamps we have updates missing for Firestore documents in BQ.

fix: a manual edit of the FS doc in question resulted in a sync to BQ (as expected).

the error contains a HTML response;

image

stack trace mentioned in the prntscrn;

at FunctionsApiClient.toFirebaseError (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:297:20) at FunctionsApiClient.enqueue (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:146:32) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /workspace/lib/index.js:108:9 { errorInfo: { code: 'functions/unknown-error', message: 'Unexpected response with status: 502 }

Steps to reproduce:

unsure how to reproduce as I do not fully understand the problem, however, as indicated by the error, a 502, it seems the error occurs in the function running the firestore-bigquery-export business logic.

Expected result

to have every update for relevant Firestore docs synced to BQ

Actual result

missing updates for Firestore docs in BQ

Nushio commented 3 months ago

I get these on a daily basis. We're beyond frustrated with this extension.

Screenshot 2024-07-24 at 9 34 30ā€ÆAM

cabljac commented 3 months ago

Hi, thanks for raising this. I'm looking into it. It may an intermittent error on the BQ side, which would be difficult to resolve from extensions. I'll keep you updated with my progress

cabljac commented 3 months ago

So the log is occurring here

When a cloud task fails to enqueue, the function makes this log. Then if more than 10 seconds have passed since the original firestore write event, it returns. Otherwise it will throw the error.

I'll have to look into the reasons for implementing it like this.

Maybe we should be retrying the enqueue at this point instead, to prevent data loss.

boywijnmaalen commented 3 months ago

appreciate your time looking into this.

the link you provided speaks of a '1mb limit' payload limit for cloud tasks.

I can confirm our data object is no where near 1mb in size.

cabljac commented 3 months ago

Yeah the code comment about payload size isn't relevant here, but is another existing issue with this extension for some users with large payload sizes.

The relevant link in the link is the catch block at line 143, if you're interested.

I'll see if we can get this issue prioritised.

Nushio commented 3 months ago

So the log is occurring here

When a cloud task fails to enqueue, the function makes this log. Then if more than 10 seconds have passed since the original firestore write event, it returns. Otherwise it will throw the error.

I'll have to look into the reasons for implementing it like this.

Maybe we should be retrying the enqueue at this point instead, to prevent data loss.

Speaking of dataloss, I found another huge issue with the cloud tasks and data loss and a way to replicate:

If there's any pending tasks on the cloud task queue when an upgrade or configuration change is made to the extension, the cloud tasks are cleared.

As a way to test this, set up the extension, and navigate to the extension's task list https://console.cloud.google.com/cloudtasks ending in "syncBigQuery",

  1. Pause the queue, and make a few modifications on firestore so that they're queued on the cloud task queue.
  2. Check on BigQuery that the changes aren't replicating (they shouldn't, queue is paused)
  3. Change the extension configuration, and check that the queue is now clear
  4. Check BigQuery to see that the tasks are indeed gone and the data never made it to BQ.
cabljac commented 3 months ago

thanks for bringing that up @Nushio

I think same might happen even if they're not paused as well, if the queues are backed up enough. I'll open a new issue to track this, and discuss it with the extensions team. I don't have an immediate viable solution for that in mind either, as the task queue name is pretty much fixed across reconfigurations (i think).

Nushio commented 3 months ago

Yes, our queue wasn't paused when I ran an update and lost data as a result.

(It's not lost, it's still on firestore, but now I need to ping each document to trigger a sync)

Pausing was a simple way to replicate the issue.

boywijnmaalen commented 2 months ago

@cabljac I was wondering if there is an update on this?

this weekend we had a super busy weekend (business wise) and it turned out several Firestore document updates did not land in BQ.

we've added alerting when errors happen in the extension, however the errors do not include the document ID in question, so we only know the sync mechanism failed, but nog for which document (we have to look that up manually).

our alerting is triggered on; resource.labels.function_name=~"ext-firestore-bigquery-export" severity=ERROR

image

as you can see the errors occured more or less at the same time, with one outlier.

I'm aware one error type is not in scope of this specific issue. however I wanted it share it with you anyway as the general issue we're facing is; missing/failing firestore>biguery events. Please let me know if I should raise a separate issue for this one

Error 1; (perhaps slightly off (issue) topic)

{ "textPayload": "Error when mirroring data to BigQuery FirebaseAppError: Credential implementation provided to initializeApp() via the \"credential\" property failed to fetch a valid Google OAuth2 access token with the following error: \"Error fetching access token: Could not fetch URI /computeMetadata/v1/instance/service-accounts/default/token\n\".\n at /workspace/node_modules/firebase-admin/lib/app/firebase-app.js:85:19\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async FunctionsApiClient.enqueue (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:138:13)\n at async /workspace/lib/index.js:108:9 {\n errorInfo: {\n code: 'app/invalid-credential',\n message: 'Credential implementation provided to initializeApp() via the \"credential\" property failed to fetch a valid Google OAuth2 access token with the following error: \"Error fetching access token: Could not fetch URI /computeMetadata/v1/instance/service-accounts/default/token\\n' +\n '\".'\n },\n codePrefix: 'app'\n}", "resource": { "type": "cloud_function", "labels": { "project_id": "XXXX", "region": "europe-west2", "function_name": "ext-firestore-bigquery-export-XXXX-fsexportbigquery" } }, "timestamp": "2024-08-23T19:51:53.216910Z", "receiveTimestamp": "2024-08-23T19:51:53.338847621Z" }

Error 2; (same as original issue topic)

{ "textPayload": "Error when mirroring data to BigQuery FirebaseFunctionsError: Unexpected response with status: 502 and body: <!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 502 (Server Error)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>502.</b> <ins>Thatā€™s an error.</ins>\n <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>Thatā€™s all we know.</ins>\n\n at FunctionsApiClient.toFirebaseError (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:297:20)\n at FunctionsApiClient.enqueue (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:146:32)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async /workspace/lib/index.js:108:9 {\n errorInfo: {\n code: 'functions/unknown-error',\n message: 'Unexpected response with status: 502 and body: <!DOCTYPE html>\\n' +\n '<html lang=en>\\n' +\n ' <meta charset=utf-8>\\n' +\n ' <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\\n' +\n ' <title>Error 502 (Server Error)!!1</title>\\n' +\n ' <style>\\n' +\n ' *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\\n' +\n ' </style>\\n' +\n ' <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\\n' +\n ' <p><b>502.</b> <ins>Thatā€™s an error.</ins>\\n' +\n ' <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>Thatā€™s all we know.</ins>\\n'\n },\n codePrefix: 'functions'\n}", "resource": { "type": "cloud_function", "labels": { "region": "europe-west2", "project_id": "XXXX", "function_name": "ext-firestore-bigquery-export-XXXX-fsexportbigquery" } }, "timestamp": "2024-08-23T15:36:11.806726Z", "receiveTimestamp": "2024-08-23T15:36:12.078665762Z", }

boywijnmaalen commented 1 month ago

@cabljac

a kind reminder šŸ™šŸ¾

do you have an update on a potential fix?

cabljac commented 1 month ago

Hi @boywijnmaalen

Just looking to get prioritisation on this, which we may now have

cabljac commented 1 month ago

The issue with my fix idea is that it's not really going to fix it - it will reduce the frequency of this error but we're still going to get data loss...

cabljac commented 1 month ago

Out of interest @boywijnmaalen do you have EventArc events enabled for the extension? Trying to rule things out

boywijnmaalen commented 1 month ago

@cabljac

Unsure what you mean if we have EventArc enabled for the extension? (We use the Google EventArc API because of our (v2) Firebase functions/handlers, but probably this is not what you are referring too I think)

I just had a look at our firestore>bigquery extension properties, but from what I can tell there is an option to enable events if you want use custom event handlers. however we do not make use of custom event handlers.

as far as I can tell only the BigQuery API and Cloud Tasks API are required for this extension. I'm not aware of any specific event related APIs that need to be enabled as well.