Support dumping multiple Spanner databases to Avro

CAFxX commented 5 years ago

To be able to use Cloud Scheduler effectively with the Spanner->Avro template, it would be ideal if https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/spanner/ExportPipeline.java allowed specifying multiple Database IDs (instead of a single one, as happens currently)

The current template already creates a subdirectory for the exported database in the GCS output directory: if multiple databases were specified multiple subdirectories would be created, one for each database.

As an extension, it would be very useful even to make the Database ID optional, in which case the dataflow would have to enumerate the databases in the specified Spanner instance, and then export all of them.

The goal is to be able to trigger an export of one, multiple or all databases on a spanner instance from a cloud scheduler job.

azurezyq commented 5 years ago

Agreed that this would be very useful. However due to the nature of dataflow templates, job graph cannot be changed once the template is built. Which means that it would need more work on the template feature side to be able to support that.

We are working on something which would remove the limitation mentioned above. Will revisit this soon.

CAFxX commented 5 years ago

@azurezyq thanks for the reply.

However due to the nature of dataflow templates, job graph cannot be changed once the template is built.

Just to confirm: does this apply even if the databases are exported serially?

Just FTR I also filed the same request via enterprise support: https://console.cloud.google.com/support/cases/detail/19577487?folder&organizationId=956776603191

github-actions[bot] commented 5 months ago

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

CAFxX commented 5 months ago

Issue isn't solved yet

GoogleCloudPlatform / DataflowTemplates

Support dumping multiple Spanner databases to Avro #38