Open CAFxX opened 5 years ago
Agreed that this would be very useful. However due to the nature of dataflow templates, job graph cannot be changed once the template is built. Which means that it would need more work on the template feature side to be able to support that.
We are working on something which would remove the limitation mentioned above. Will revisit this soon.
@azurezyq thanks for the reply.
However due to the nature of dataflow templates, job graph cannot be changed once the template is built.
Just to confirm: does this apply even if the databases are exported serially?
Just FTR I also filed the same request via enterprise support: https://console.cloud.google.com/support/cases/detail/19577487?folder&organizationId=956776603191
This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.
Issue isn't solved yet
To be able to use Cloud Scheduler effectively with the Spanner->Avro template, it would be ideal if https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/spanner/ExportPipeline.java allowed specifying multiple Database IDs (instead of a single one, as happens currently)
The current template already creates a subdirectory for the exported database in the GCS output directory: if multiple databases were specified multiple subdirectories would be created, one for each database.
As an extension, it would be very useful even to make the Database ID optional, in which case the dataflow would have to enumerate the databases in the specified Spanner instance, and then export all of them.
The goal is to be able to trigger an export of one, multiple or all databases on a spanner instance from a cloud scheduler job.