frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
701 stars 147 forks source link

--skip-errors doesn't work for packages #1639

Open diego-oncoramedical opened 7 months ago

diego-oncoramedical commented 7 months ago

Overview

Edit: In my case, I only tried foreign key checks, but as @fjuniorr noted below, --skip-errors appears to be broken for all errors when checking a package.

When validating a package using the CLI, --skip-errors does not appear to disable foreign key checks. Validation passes if and only if the foreign keys are commented out in each table schema file.

I'm running the following command:

frictionless validate --trusted --limit-errors 50 --skip-errors [see below] $OUTPUT_DIR/package.json

For the error slug, I've tried:

I've also tried all four at the same time, separated by commas with no intervening spaces.

Sample output:

                                              dataset                                              
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ name                 ┃ type  ┃ path                                                  ┃ status  ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ medical_patient      │ table │ /var/data-pkg/output/Patient_20240112150044.csv       │ VALID   │
│ medical_encounter    │ table │ /var/data-pkg/output/Encounter_20240112150042.csv     │ VALID   │
│ medical_medications  │ table │ /var/data-pkg/output/Medications_20240112150809.csv   │ VALID   │
│ medical_problem      │ table │ /var/data-pkg/output/Problem_20240112150453.csv       │ VALID   │
│ medical_toxicity     │ table │ /var/data-pkg/output/Toxicity_20240112151505.csv      │ INVALID │
│ medical_observations │ table │ /var/data-pkg/output/Observations_20240112155005.csv  │ VALID   │
│ medical_vitals       │ table │ /var/data-pkg/output/Vitals_20240112150819.csv        │ VALID   │
└──────────────────────┴───────┴───────────────────────────────────────────────────────┴─────────┘

                                                                                       medical_toxicity
┏━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row ┃ Field ┃ Type        ┃ Message                                                                                                                                                     ┃
┡━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI"                         │
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "MRN": values "......" not found in the lookup table "medical_patient" as "MRN"                           │
│ 2   │ None  │ foreign-key │ Row at position "2" violates the foreign key: for "EncounterNumber": values "......" not found in the lookup table "medical_encounter" as "EncounterNumber" │
│ 3   │ None  │ foreign-key │ Row at position "3" violates the foreign key: for "EMPI": values "......" not found in the lookup table "medical_patient" as "EMPI"                         │

...etc

Info

Environment:

App is running inside the official Python 3.12.1 Alpine Linux Docker image.

The requirements.txt file, in its entirety:

chardet==5.2.0          # Character encoding detection
click==8.1.7            # CLI
frictionless==5.16.1    # Validation
pandas==2.2.0           # CSV loading and cleaning
pyyaml==6.0.1           # Configuration file loading

Package

The package consists of a few unremarkable CSVs:

Package JSON, presented as YAML for readability:

resources:
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_patient
  path: /var/data-pkg/output/Patient_20240112150044.csv
  schema: /app/schemas/medical/patient.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_encounter
  path: /var/data-pkg/output/Encounter_20240112150042.csv
  schema: /app/schemas/medical/encounter.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_medications
  path: /var/data-pkg/output/Medications_20240112150809.csv
  schema: /app/schemas/medical/medications.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_problem
  path: /var/data-pkg/output/Problem_20240112150453.csv
  schema: /app/schemas/medical/problem.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_toxicity
  path: /var/data-pkg/output/Toxicity_20240112151505.csv
  schema: /app/schemas/medical/toxicity.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_observations
  path: /var/data-pkg/output/Observations_20240112155005.csv
  schema: /app/schemas/medical/observations.yaml
  type: table
- encoding: utf-8
  format: csv
  mediatype: text/csv
  name: medical_vitals
  path: /var/data-pkg/output/Vitals_20240112150819.csv
  schema: /app/schemas/medical/vitals.yaml
  type: table
fjuniorr commented 4 months ago

It looks like this is a more general error that we can't skip any error in the CLI for validating packages. In frictionless 5.17.0 with this reprex I get:

frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/datapackage.json
────────────────────────────────────────────────────────────── Dataset ───────────────────────────────────────────────────────────────
               dataset               
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ name ┃ type  ┃ path     ┃ status  ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ data │ table │ data.csv │ INVALID │
└──────┴───────┴──────────┴─────────┘
─────────────────────────────────────────────────────────────── Tables ───────────────────────────────────────────────────────────────
                                         data                                         
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Row  ┃ Field ┃ Type        ┃ Message                                               ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ None │ 2     │ blank-label │ Label in the header in field at position "2" is blank │
└──────┴───────┴─────────────┴───────────────────────────────────────────────────────┘

When I validate the data file (or a standalone resource) the check is properly skipped:

frictionless validate --skip-errors "blank-label" https://raw.githubusercontent.com/splor-mg/reprex/main/reprex/20231228T143527/data.csv
─────────────────────────────────────────────────────── Dataset ────────────────────────────────────────────────────────
              dataset               
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃ name ┃ type  ┃ path     ┃ status ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━┩
│ data │ table │ data.csv │ VALID  │
└──────┴───────┴──────────┴────────┘
diego-oncoramedical commented 4 months ago

Good catch. I'll change the title of the ticket to reflect this.