apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.43k stars 2.22k forks source link

How tracke authors of iceberg snapshots? #9928

Open nikita-sheremet-flocktory opened 7 months ago

nikita-sheremet-flocktory commented 7 months ago

Query engine

trino

Question

Every change to iceberg table leads to new snapshot and metadata file. Is there a way to map user lauched a query with snapshot in iceberg table? With that knoledge I can identify what user made changes.

Thanks in advance!

amogh-jahagirdar commented 7 months ago

@nikita-sheremet-flocktory Iceberg snapshots have a summary which can be used to store all these details. https://iceberg.apache.org/spec/#snapshots . The summary already contains stats like added files, removed files, etc but you can store additional properties like Job/User details as well in that structure.

Which compute engine are you using? Spark actions already have a snapshotProperty API you can use to set those details. Also you can set an option snapshot-property.key when performing the DF writes https://iceberg.apache.org/docs/latest/spark-configuration/#write-options.

amogh-jahagirdar commented 7 months ago

Sorry somehow missed that you specified Trino. I don't think they have a way for users to add any additional snapshot summary properties then what they internally persist? You could try creating an issue in Trino itself to see about the best way to enable this.

jbouricius commented 6 months ago

Is it possible to edit the summary / additional properties of a snapshot after it has been committed? Are there any pitfalls or reasons not to do so?

amogh-jahagirdar commented 6 months ago

No, it's not really possible to safely update snapshot summaries after they are committed since snapshots are immutable.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.