frictionlessdata / goodtables.io

Data validation as a service. Project retired, got to the current one at frictionsless/repository
https://goodtables.io
GNU Affero General Public License v3.0
69 stars 16 forks source link

Are foreign keys validated at gootables.io? #314

Closed afuetterer closed 4 years ago

afuetterer commented 6 years ago

I have this very minimal setup of a datapackage with two csv files: datasets.csv and variables.csv. I have a foreign key in variables.csv called "dataset" that points to "name" in "datasets.csv". It looks like this:

{
  "profile": "tabular-data-package",
  "resources": [
    {
      "path": "data/datasets.csv",
      "profile": "tabular-data-resource",
      "name": "datasets",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "name": "name",
            "type": "string",
            "format": "default"
          }
        ],
        "missingValues": [
          ""
        ]
      }
    },
    {
      "path": "data/variables.csv",
      "profile": "tabular-data-resource",
      "name": "variables",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "name": "name",
            "type": "string",
            "format": "default"
          },
          {
            "name": "dataset",
            "type": "string",
            "format": "default"
          }
        ],
        "foreignKeys": [
          {
            "fields": "dataset",
            "reference": {
              "resource": "datasets",
              "fields": "name"
            }
          }
        ]
      },
      "missingValues": [
        ""
      ]
    }
  ]
}

datasets.csv

name
ah

variables.csv

name dataset
ah06 ah
bh06 bh (invalid foreign key entry)

If I run check_relations() on the variables resource with a non existing entry for the foreign field key field I get a RelationError, which is good.

from datapackage import Package
resource = Package(DATAPACKAGE.json).get_resource('variables')
resource.check_relations()
 tableschema.exceptions.RelationError: Foreign key "['dataset']" violation in row "3"

The same datapackage gets a positive validation in goodtables.io. Is the foreign key relation not validated in goodtables.io?

vdubya commented 5 years ago

Also interested to know if foreign key relation validation is occurring in goodtables.io.