firebase / extensions

Source code for official Firebase extensions
https://firebase.google.com/products/extensions
Apache License 2.0
882 stars 372 forks source link

🐛 [firestore-bigquery-export] backfill for 2.3M documents cost $400,000 #2021

Open williamkolean opened 3 months ago

williamkolean commented 3 months ago

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs for a specific extension in this repository. If you have a general question, need help debugging, or fall into some other category use one of these other channels:

[REQUIRED] Step 2: Describe your configuration

[REQUIRED] Step 3: Describe the problem

We originally did a backfill with 200 docs per backfill, and it finished quickly but didn't include all documents. So we lowered to 100 docs per backfill to match the max number of synced documents. This time the backfill was a lot slower than before, so we just let it run. Unfortunately it seems each iteration was a little slower than the previous iteration, until it reached a point where the tasks started timing out. Once that happened, it caused a rapid escalation in resource use because the task would retry 100 times and keep failing while new tasks continued to be created. Because this escalated over the weekend, when we checked the progress on Monday we saw a $400,000 billing charge when a normal month is less than $100.

Steps to reproduce:

Try to import documents with similar settings to above and check the time for each iteration to complete. Each iteration will take a little bit longer than the previous iteration.

Expected result

Redoing the import using fs-bq-import-collection (without wildcards to be on the safe side) we were able to import 300 docs a second, and it finished in under 5 hours.

Actual result

By the time the extension was killed there were 20k tasks in the queue and the cloud function logs were full of timeout errors and the read count was 662,466,290,104.

cabljac commented 3 months ago

Hi, we have raised this with the Firebase team, will get back to you ASAP on this.

filiocorp commented 1 month ago

@cabljac I know that you disabled the import feature but the script has many issues; I keep getting errors and it does not import all the documents. I have been running it multiple times and it only recorded 8000 documents out over 100,000

{"severity":"WARNING","message":"Error when inserting data to table."}
An error has occurred on the following documents, please re-run or insert the following query documents manually...

If I run it with batch size one, I get another error

{"severity":"WARNING","message":"Error when inserting data to table."}
An error has occurred on the following documents, please re-run or insert the following query documents manually... {}
{}
filiocorp commented 1 month ago

Update: I turned off the multithread and now it works.