GoogleCloudPlatform / firebase-extensions

Apache License 2.0
77 stars 41 forks source link

bigquery-firestore-export - export large ammount of data #427

Open sjerzak opened 7 months ago

sjerzak commented 7 months ago

Hi

I got few questions about bigquery-firestore-export extesion.

I did tried to use bigquery-firestore-export extension with large amount of data(200kk+ rows) IIRC firestore have write limit for batch operations and it doesn't allows to write more than 1000 writes/second. Google cloud functions which export data from temp table created by extension can be runned for max 9 minutes = 540 seconds.

This give us arround 540 000 writes. Is there any way to insert more data? I also tried to configure extension to use multiple function intances but it din't change anything. Even when I forced "Minimum function instances" to be higher the amount of data writed to firestore didn't change comparing to runs when "Minimum function instances" was't set.

Also when "Minimum function instances" wasn't set and "Maximum function instances" was set to 10 extension run didn't scale automatically and used only two instances.

Can I ask how does setting "Min/Max function instances" affect function execution?

huangjeff5 commented 5 months ago

Hey there, Can you share more about your use case? (what exactly is the data you're exporting to Firestore and why?). That would help me understand if there's another way to solve your problem well!

To answer some of your questions though:

First, Min/Max function instances doesn't impact execution in this case because ultimately it runs in a single function execution as you noticed.

Second, I think there might be ways that we can better support a big data import into Firestore with Dataflow. (something we're exploring with another extension that we're working on). That may be worth implementing eventually, if it makes sense for this use case.