Open casperdamen123 opened 1 year ago
@andrefurlan-db would appreciate your thoughts on this one.
@casperdamen123 did some research and one option is to specify your metadata as table properties: https://docs.databricks.com/en/delta/custom-metadata.html#store-custom-tags-in-table-properties
they won't be column scoped in your yml file but you could do something like this:
models:
- name: model_name
config:
tblproperties:
- "key": "value"
where key could be something like columnName.propertyName
@benc-db that looks promising. This is already implemented in dbt-databricks at the moment or do we need to build this?
Should work already. Let me know if you hit any issues.
@benc-db did some testing but it doesn't seem to work just yet.
version: 2
models:
- name: test
description: "This is a test"
config:
tblproperties:
- "test.type": "geometry"
columns:
- name: test
description: "test column"
Results in:
06:12:28 create or replace table `gima_test`.`test`.`test`
06:12:28
06:12:28
06:12:28 using delta
06:12:28
06:12:28
06:12:28
06:12:28
06:12:28
06:12:28 comment 'This is a test'
06:12:28
06:12:28
06:12:28 tblproperties ('{'test.type': 'geometry'}' = ''
06:12:28 --------------------------^^^
06:12:28 )
06:12:28 as
06:12:28
06:12:28 select "test" as test
When not using the table.column notation
version: 2
models:
- name: test
description: "This is a test"
config:
tblproperties:
- "some_key": "geometry"
columns:
- name: test
description: "test column"
Results in:
06:14:30 create or replace table `gima_test`.`test`.`test`
06:14:30
06:14:30
06:14:30 using delta
06:14:30
06:14:30
06:14:30
06:14:30
06:14:30
06:14:30 comment 'This is a test'
06:14:30
06:14:30
06:14:30 tblproperties ('{'some_key': 'geometry'}' = ''
06:14:30 ------------------------------^^^
06:14:30 )
06:14:30 as
06:14:30
06:14:30 select "test" as test
hmm, well I know we're using it in successfully in the model files: https://github.com/databricks/dbt-databricks/blob/3609a6ed5375848ea942036b2735fb6efc76101c/tests/integration/set_tblproperties/models/set_tblproperties.sql#L4. Could you try that and let me know if you get the same error?
Ok, so here's what's going on...it's yaml being weird. if you specify like this:
tblproperties: { "column.meta": "value" }
it should work (as opposed to the yaml expression of a dictionary).
I'll see if I can get to the bottom of this, but in the mean time, can you try that work around?
- "some_key": "geometry"
I think this is the issue...sets the value of tblproperties to be an array with a dictionary as an entry, instead of:
tblproperties:
"some_key" : "geometry"
my mistake for giving you bad yaml.
I'm going to update the docs on dbt to include info on how to set tblproperties.
Tried to use the following as per your example
version: 2
models:
- name: test
description: "This is a test"
config:
tblproperties:
"test.test": "geometry"
columns:
- name: test
description: "test column"
The model run passes but the metadata for the test
column is not populated
Hi @benc-db , the solution you proposed does not work.
using tblproperties: { "column.meta": "value" }
nothing populated in Data Explorer UI
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.
keep alive
Describe the feature
The meta key/value pairs should be persisted and pushed to the delta table when materializing dbt models.
Describe alternatives you've considered
The alternative is to add a custom Python model and leverage
PySpark
to add this metadata to a specific columndf.withMetadata("column_name", {"key_1": "value_2", "key_2": value_2})
. However, I would like to keep this kind of details in the.yml
configuration files and avoid using additional Python models to only add metadata to columns.Additional context
I've tried to add it to the description field since I've noticed this is persisted as a
comment
. However, the key name of the metadata for the column will then always becomment
and all the information needs to be inputed in that same key. I would like to have several metadata keys holding different types of information.Who will this benefit?
All dbt users that leverage the power of metadata on delta tables in Databricks.