Open fanaticjo opened 9 months ago
@fanaticjo can you elaborate on what you mean by 2 snapshot ids or only latest data
?.
In short, branches/tags support the Auditing use case and you might want to take a look at the docs in https://iceberg.apache.org/docs/latest/branching/.
I think what you're looking for is ALTER TABLE prod.db.sample CREATE TAG
historical-tagAS OF VERSION <snapshot_id>
. The snapshot_id
in this case doesn't have to be the latest snapshot.
i want to create a branch / tag only for the latest data load while AS of version considers the latest data and the previous data also .
For example
insert 1 , 2 ,3 --- snapshot id 1
if i create a branch with as of version 1 the branch will have 1 ,2 ,3
in next load insert 4 , 5 , 6 --- snapshot id 2
if i create a branch with as of version 2 the branch will have 1 , 2 , 3 ,4 ,5 ,6
what i want is only how i can create a branch for only 4 ,5 ,6
Are you saying you want to create a branch/tag and refer to a snapshot without its history? I don't think this is possible today. What would be the use case of not keeping the ancestor history or is there a particular concern that the ancestor history is kept?
we just wanted using a tag/ branch to pull out the data written into that period only . i saw there is an incremental read available in in dataframe df = spark.read \ .format("iceberg") \ .option("start-snapshot-id", "360041659320668788") \ .option("end-snapshot-id", "9170237062650942416") \ .load("glue_catalog.playground.cash_report_iceberg")
is there an option this can be done through spark sql then also it would solve our requirement .
Feature Request / Improvement
Is there a way where we can create a branch / TAG based on 2 snapshot ids or only latest data
We have a use case where we write monthly generated report to a iceberg table , but for every month we want to tag / branch the data for audit purposes .
Currently branch / tag creates the data from that snapshot to first snapshots .
if this task is possible , please let us know or if we can contribute also
Query engine
Spark