Open nikita-sheremet-flocktory opened 7 months ago
@nikita-sheremet-flocktory Iceberg snapshots have a summary which can be used to store all these details. https://iceberg.apache.org/spec/#snapshots . The summary already contains stats like added files, removed files, etc but you can store additional properties like Job/User details as well in that structure.
Which compute engine are you using? Spark actions already have a snapshotProperty
API you can use to set those details. Also you can set an option snapshot-property.key
when performing the DF writes https://iceberg.apache.org/docs/latest/spark-configuration/#write-options.
Sorry somehow missed that you specified Trino. I don't think they have a way for users to add any additional snapshot summary properties then what they internally persist? You could try creating an issue in Trino itself to see about the best way to enable this.
Is it possible to edit the summary / additional properties of a snapshot after it has been committed? Are there any pitfalls or reasons not to do so?
No, it's not really possible to safely update snapshot summaries after they are committed since snapshots are immutable.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Query engine
trino
Question
Every change to iceberg table leads to new snapshot and metadata file. Is there a way to map user lauched a query with snapshot in iceberg table? With that knoledge I can identify what user made changes.
Thanks in advance!