Closed keen85 closed 2 years ago
These attributes are remnant from before Delta was open-sourced. Now we generally don't want Databricks concepts like these as first class citizens within Delta.
Instead, you can save these sorts of attributes in the "userMetadata"
(likely formatted as json). Let me know what you think.
Hi @allisonport-db, That should work. However, the handling will not be as easy as it would be with dedicated attributes.
Imagine I'd like to search the history for all changes that were induced by one specific notebook. Since userMetadata
contains a string, before filtering, you need to parse the json-string; but it is doable.
Out of curiosity: are there any plans to actually remove the "deprecated" attributes at some time from the history schema?
are there any plans to actually remove the "deprecated" attributes at some time from the history schema?
We don't plan to remove them as that would break compatibility. But we also don't plan to support more features on top of these deprecated attributes.
Closing this as we don't plan to support this.
I know this issue is closed, I'm only interested into the userName
column as I think it would be cool to have it for audit purpose.
userMetadata
can be used but it requires an extra config and can also be set with the "wrong" user for malicious purposes.
I believe the change to log the current user is pretty simple: https://github.com/delta-io/delta/blob/d7483ad5a5ad50cafbe74cbe9019be8f9389d8b4/core/src/main/scala/org/apache/spark/sql/delta/actions/actions.scala#L1032-L1035
Instead of returning None
, returning Option(System.getProperty("user.name"))
should do the trick.
Let me know what you think, I can provide a PR.
@zsxwing @allisonport-db Not sure if you saw my previous message.
Feature request
Overview
Delta History schema features some attributes that are always NULL for me (Delta Lake 1.1 Spark 3.1):
Id like to set these attributes manually for write operations.
Motivation
This information would help technically keeping track of the changes. It would promote a better data lineage.
Further details
I could imagine two ways to implement my feature request:
spark.delta.userId
spark.delta.userName
spark.delta.notebook
spark.delta.clusterId
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
Unfortunately I know nothing about Scala :(