GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.12k stars 936 forks source link

CassandraToBigtable classic template feature: Cassandra Writetime replication #1653

Open georgecma opened 1 month ago

georgecma commented 1 month ago

Add Cassandra writetime replication feature to CassandraToBigtable classic template.

Cassandra writetime can now be replicated to Bigtable as Bigtable cell timestamp. For this feature to work, users have to upload a json-like schema file to a GCS bucket location first for the table they wish to replicate writetimes for. The command to generate this schema file is:

cqlsh -e "select json * from system_schema.columns
                where keyspace_name='$CASSANDRA_KEYSPACE'
                and table_name='$CASSANDRA_TABLE'"
    > column_schema.json

Then, to upload the file:

gcloud storage cp column_schema $CASSANDRA_COLUMN_SCHEMA

The schema GCS file path should then be set as CASSANDRA_COLUMN_SCHEMA to be parsed by the template at run time.

The template replication behavior now sets Bigtable cell time as replication time (i.e. now) instead of epoch start time previously. SetZeroTimestamp is the backwards compatibility option to set 0 as the timestamp.