firebase / extensions

Source code for official Firebase extensions
https://firebase.google.com/products/extensions
Apache License 2.0
893 stars 383 forks source link

firestore-bigquery-export backfilling less than 800k docs took days and cost over 1k U$ #2000

Open MorenoMdz opened 8 months ago

MorenoMdz commented 8 months ago

[REQUIRED] Step 2: Describe your configuration

[REQUIRED] Step 3: Describe the problem

We have been heavy users of this extension for over 2 years it has provided great value and near flawless replication from our FS data to BQ, until March 19th we did setup the exporter in our smaller collection, jobs, that had at the time under 800k documents. The thing is that the exporter ended up running for almost 2 days, in a pattern that seemed recursive, and sometimes reading over 9k documents per second, notice how the Docs per backfill setting was set to 200, and this ended up causing us a 1k U$ spike in billing for that one day.

It is easy to see the read spikes bubbling up and slowly fading until they stopped over one day later from the FS key visualizer.

I do have a ticket open about this issue, but I do think this should be reported here.

Note, this was the first time we used the "backfill" option from the FB console, before that we always backfilled by hand with the good old script.

I cannot provide more information as it would be sensitive, but the internal GCP ticket is 50277733

Expected result

Under 50 cents of billing for this export. And the export should have taken a couple minutes.

Actual result

Over one thousand dollars was charged. And the export took almost two days.

pr-Mais commented 8 months ago

@MorenoMdz I tried reproducing but without luck, can you check the logs in the function fsexportbigquery, do you see any errors?

MorenoMdz commented 8 months ago

@MorenoMdz I tried reproducing but without luck, can you check the logs in the function fsexportbigquery, do you see any errors?

We do have a bunch of "Cannot partition an existing table firestore_export_jobs_raw_changelog" warnings, but that's very common on the FS BQ exporter, no errors tho.

As I mentioned we are heavy users of the exporter since 2021 but this was the first collection the extension itself did the backfill, we always used the backfill script previously, so I would point towards an issue in the backfill itself. If you check the Firestore key visualizer you will notice the reads per second bubbled up over the next hours in something that looked recursively/exponentially called.

pr-Mais commented 8 months ago

I think we have a clue why this might be happening. The backfilling function uses offset to enqueue the batches to import sequentially, this explains the burst in reads you had. Thanks for the details you provided, we will disable this feature until we come up with a solution.

MorenoMdz commented 8 months ago

I think we have a clue why this might be happening. The backfilling function uses offset to enqueue the batches to import sequentially, this explains the burst in reads you had. Thanks for the details you provided, we will disable this feature until we come up with a solution.

I see! Thanks for the quick response.

pr-Mais commented 8 months ago

I can also confirm I had the same issue on a fresh project, total docs 33k, docs per backfill 200, total reads after backfill done is 2.7M.