Open bkirwi opened 5 months ago
@danhhz and I chatted today about how to order the work:
persist-txn
that are fairly complex.While these changes are baking we can work on two things:
k
, v
, k_s
, and v_s
columns. A concern we have is writing Codec
and structured data is it will increase the size of blobs. We can alleviate this concern by fetching only the data we need.Datum
s we currently encode as protobuf bytes.Adding the adapter label just so it shows up on our teams board
Product outcome
Persist has the ability to generate Parquet files that split out the columns in
Row
to distinct columns at the Parquet level, which comes with a host of benefits... but this has never been enabled in production.Enabling this format will have immediate throughput and storage-usage benefits, and will unlock longer-term projects like schema migration and projection pushdown.
Discovery
This has been prototyped in
main
but never durably stored in S3.Work items
Decision log