blacksmithgu / obsidian-dataview

A data index and query language over Markdown files, for https://obsidian.md/.
https://blacksmithgu.github.io/obsidian-dataview/
MIT License
7.04k stars 413 forks source link

Schema Detection/Enforcement #1002

Open therealfakemoot opened 2 years ago

therealfakemoot commented 2 years ago

Is your feature request related to a problem? Please describe. One of the main sources of friction and overhead in maintaining and using an Obsidian vault is metadata. Keys must be spelled and capitalized correctly, metadata types must be understood, and values have to be within acceptable ranges. Unfortunately, the Obsidian ecosystem lacks any tool dedicated to metadata schema enforcement.

Dataview seems like the place for this to live because it's so directly concerned with consuming this metadata.

Describe the solution you'd like I have a proposal for an extremely limited but effective schema enforcement system: 1) The schema specifies keys at the vault level 2) A key's value may have only one type 3) A key's value may only have one "input range"

If any metadata is detected anywhere that doesn't conform to something in the schema spec, dataview should...make it a default query type you can use, maybe something like this

schema
where invalid

Describe alternatives you've considered

blacksmithgu commented 2 years ago

Where should a schema be defined? And how do I say "this data follows this schema"?

therealfakemoot commented 2 years ago

Since the schema would be per-vault, it could live in the Dataview settings dialog? The schema format would almost certainly end up being a set of newline separated inputs, something like this?

created: timestamp
priority: integer, [0, 1, 2]
quality: integer, [0..5]
status: string, ["good","bad","new"]

how do I say "this data follows this schema"

I'm not sure I understand the question. Dataview already keeps a cache of note metadata. When metadata gets updated, dataview can do something like

def matchesSchema(key, value):
    type, range = schemaMap[key]
    if key.type() != type:
        return false

    if !range.contains(value):
        return false

    return true

for note in vault:
    for key,value in note.metadata:
        if !matchesSchema(key, value):
            alert("notename keyname doesn't match schema")
blacksmithgu commented 2 years ago

Oh no - for the second question I'm wondering what the right way to indicate as a user what schema the data follows. Do I indicate in the query itself? (TABLE FROM ... FOLLOWS SCHEMA), on each individual page in the frontmatter (schema: thing)?

AB1908 commented 2 years ago

I wonder if we could separate it into statements like:

USE SCHEMA sample;
Table from folder;

This is definitely cumbersome from a user perspective as I imagine a lot of folks would run into type mismatches so it'd be best as an optional thing if you do ever think of adding this.