firebase / extensions

Source code for official Firebase extensions
https://firebase.google.com/products/extensions
Apache License 2.0
893 stars 384 forks source link

🐛 [firestore-bigquery-export] backfilling less than 300k docs took days and cost ~$200 USD #2003

Open jjaklitsch opened 8 months ago

jjaklitsch commented 8 months ago

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs for a specific extension in this repository. If you have a general question, need help debugging, or fall into some other category use one of these other channels:

[REQUIRED] Step 2: Describe your configuration

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

We installed the suggestion and set the preference to import existing records. The firestore database we imported from had <250K records. While importing, we saw a massive spike in firestore reads up to 45 million per hour. Our typically read volume is <10K per hour. We incurred a cost of ~$200 just from running this import.

Expected result

Bigquery database is created with minimal impact on read volumes

Actual result

45 million firestore reads per hour. 120 million reads total in a few hours.

jjaklitsch commented 8 months ago

Linking to the same bug someone else reported: https://github.com/firebase/extensions/issues/2000

cabljac commented 8 months ago

Hey looking into this now, do you have any relevant cloud function logs/errors?

cabljac commented 8 months ago

I believe this issue is caused by us using offset to paginate, I am working on an alternative approach.

jjaklitsch commented 8 months ago

Yes, see attached for the logs. firestore-export-logs.docx

When do you think you'll have a fix in? Also, what's the process for requesting a credit?

jjaklitsch commented 8 months ago

Hi - is there any update on this? Any other recommendations for streaming firestore data to bigquery?

On Tue, Mar 26, 2024 at 3:52 AM Jacob Cable @.***> wrote:

I believe this issue is caused by us using offset to paginate, I am working on an alternative approach.

— Reply to this email directly, view it on GitHub https://github.com/firebase/extensions/issues/2003#issuecomment-2020109360, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRPP65GMCX3WTHGSAQ6VTY2FAOTAVCNFSM6AAAAABFICKA3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGEYDSMZWGA . You are receiving this because you authored the thread.Message ID: @.***>

pr-Mais commented 8 months ago

You can still use the extension for streaming, this issue only affects backfilling which we disabled for now. Another solution that can backfill your existing data is to use the import script, which you can run locally.

You can reach out to Firebase support on this link.

huangjeff5 commented 6 months ago

Hi, software engineer from Firebase here.

Just wanted to chime in on this issue, we have turned off backfill so if you use the latest version you won't run into the issue, and as Mais explained above, the import script is the temporary work around.

That being said we are actively working on reworking the backfill implementation such that offset isn't used. Will follow up when that is pushed out.