duckdb / duckdb

DuckDB is an analytical in-process SQL database management system
http://www.duckdb.org
MIT License
19.3k stars 1.59k forks source link

Add "Tags" support to catalog entries #12044

Closed Maxxen closed 1 month ago

Maxxen commented 1 month ago

This PR adds support for adding "tags", a map of string key-value pairs, to DuckDB's catalog entries.

Note that this PR does not actually add any syntax or functions to enable users to tag objects through SQL. As of now I am primarily motivated to add this so that extensions can tag the internal catalog entries they add at runtime with metadata that can be used to e.g. identify which extension added a catalog item or provide additional information for tooling such as automatic documentation/reference generation. So even though this does add serialization of tags to the storage, nothing will actually be serialized in practice (unless some naughty extension starts tagging non-internal user-created items)

Some thoughts:

  1. Right now tags are unordered_map<string, string>. You could make the case that they should store duckdb::Value's instead (like comment), but given how I expect tags to be commomly used I think it makes more sense to store them as strings. For example, anytime you want to query the information views (duckdb_functions(), duckdb_types(), ...) the tags are going to have to be returned as a string anyway (since DuckDB doesn't have a variant type), and I imagine that even if you run some sort of internal routine to search or compare for tags you don't actually want to deal with the comparison semantics of different value types when searching for matching tags (is 1 == '1'? do we just stringify or invoke duckdbs cast logic?). If you really want to store a list or struct or something else in a tag value you can always cast it to/from string yourself.

  2. I don't know what the eventual syntax should be for adding tags through SQL. PostgreSQL doesn't have tags so we can't just follow what they do. I imagine something like:

-- set one tag
ALTER TABLE t1 ADD TAG 'K' = 'V';

-- overwrite and set all tags
ALTER TABLE t1 SET TAGS { 'K': ' V' }

-- following the COMMENT ON syntax seems kind of unnatural...
TAG ON t1 IS k = 'v'

-- alternatively, BQ uses this syntax, but we don't really have other "options"
ALTER TABLE t1 SET OPTIONS ( tags = [('k', 'v')]);
Mytherin commented 1 month ago

Thanks! Looks good to me