MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.68k stars 459 forks source link

COPY TO S3: Include mz logical timestamp in exported data #27357

Open chuck-alt-delete opened 2 months ago

chuck-alt-delete commented 2 months ago

Feature request

Including the mz logical timestamp in the data would allow analysis techniques like slowly changing dimension type 2, where analysts can track changes over time, e.g. computing an aggregation grouped by logical timestamp to show trends.

There is precedent for this with SUBSCRIBE; it includes mz logical timestamp by design.

benesch commented 1 month ago

This is meant to be supported already via subqueries!

COPY (SELECT mz_now(), * FROM t) FROM 's3://...'
chuck-alt-delete commented 1 month ago

I don’t think it would occur to most folks they can/should do this. I can make a note on the docs PR to include this as an example and encourage the technique. I wouldn’t mind a more opinionated stance where we do this by default just like with subscribe.

benesch commented 1 month ago

I can make a note on the docs PR to include this as an example and encourage the technique.

That would be great!

I wouldn’t mind a more opinionated stance where we do this by default just like with subscribe.

My guess is that a at least some subset of users don't care about the timestamp, though. SUBSCRIBE is a continual query, so we have to tell you the timestamps for you to be able to make sense of the result. But COPY TO S3 is point in time, and you can make sense of the result without the timestamps.

The main problem is somewhat mechanical. If we force mz_now() to be included by default, how do you opt out if you don't want to see the timestamps? We'd need to introduce a special dedicated ENABLE TIMESTAMP = FALSE option. It's not out of the question, but I'd want to wait until we have gathered more user feedback before committing to a dedicated option.