dbt-labs / dbt-jsonschema

Apache License 2.0
109 stars 40 forks source link

Add uniqueness validation for column #87

Closed syou6162 closed 10 months ago

syou6162 commented 10 months ago

When a developer is entering column information into yaml, sometimes duplicate column names are entered. For example, this is the case.

version: 2
models:
  - name: my_model
    columns:
      - name: user_id
      - name: user_name
      - ...
      - name: user_id

I wanted dbt-jsonschema to check for such cases, so I added "uniqueItems": true.

joellabes commented 10 months ago

Good idea @syou6162! I have just read up on uniqueItems, and it looks like it checks the entire object for equality, not just the key: https://stackoverflow.com/a/57677429. This means that as soon as you start adding descriptions/tests/etc to the columns, the second column won't mark as duplicate anymore.

It looks like there is a setting which behaves the way you want, discussed here: https://github.com/json-schema-org/json-schema-vocabularies/issues/22

However that doesn't seem to be implemented by many/any validators which isn't very helpful. Because of this I think we should leave this unmerged - wdyt?

syou6162 commented 10 months ago

This means that as soon as you start adding descriptions/tests/etc to the columns, the second column won't mark as duplicate anymore.

@joellabes What you say is correct.

However, I would be happy if this pr could be merged, because checking uniqueItems is better than not checking it, because it can detect invalid yaml files in cases like the one I wrote.

Of course, if the policy of this repository is to not include half-baked implementations, I will close this pr without merging it (I don't care!).

joellabes commented 10 months ago

Yeah I can get on board with it as "better than nothing" - let's do it!