cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.97k stars 3.79k forks source link

Changefeed in Avro format does not include mvcc_timestamp when option is specified #123078

Closed jonstjohn closed 1 month ago

jonstjohn commented 5 months ago

Describe the problem

When a changefeed is created using WITH mvcc_timestamp and Avro format, the mvcc_timestamp never gets emitted in the message. Using WITH updated works for emitting the updated timestamp, but mvcc_timestamp is required to accurately include the timestamp during an initial scan.

To Reproduce

Setup a confluent schema registry.

Run cockroach demo.

Run the following sinkless changefeed:

CREATE CHANGEFEED FOR TABLE movr.users 
WITH 
        updated ,
        full_table_name
        , mvcc_timestamp    
        , format = avro
        , envelope = wrapped
        , confluent_schema_registry = 'http://127.0.0.1:8081/'       
        , schema_change_events = column_changes
        , schema_change_policy = nobackfill
        , kafka_sink_config='{"RequiredAcks": "ALL", "Compression": "GZIP"}'   
        , initial_scan = 'no'
        ;

Insert a row into the movr.users table. Notice that the changefeed message does not have the mvcc_timestamp field.

Expected behavior The mvcc_timestamp field is included, similar to how it is included when using json format.

Environment:

Additional context The only other option is to use the updated field, which does not accurately reflect the mvcc_timestamp during initial scan or backfill.

Jira issue: CRDB-38190

Epic CRDB-41784

keljopap commented 5 months ago

Just noting we are seeing this with the experimental change feed on Cockroach v23.1.17 as well.

blathers-crl[bot] commented 4 months ago

cc @cockroachdb/cdc