frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
696 stars 145 forks source link

Confusion about why this file is not reported invalid #1646

Open thbar opened 5 months ago

thbar commented 5 months ago

Hello,

I'm using frictionless as a CLI at the moment (version 5.16.1).

I have files which do not respect a schema, for which the CLI reports no errors, and I'm a bit puzzled about why.

frictionless validate https://www.data.gouv.fr/fr/datasets/r/fde557ec-b96e-49a5-9282-31407296282c \
--schema https://schema.data.gouv.fr/schemas/etalab/schema-irve-dynamique/latest/schema-dynamique.json \
--json

Returns:

{
  "valid": true,
  "stats": {
    "tasks": 1,
    "errors": 0,
    "warnings": 0,
    "seconds": 0.692
  },
  "warnings": [],
  "errors": [],
  "tasks": [
    {
      "name": "fde557ec-b96e-49a5-9282-31407296282c",
      "type": "file",
      "valid": true,
      "place": "https://www.data.gouv.fr/fr/datasets/r/fde557ec-b96e-49a5-9282-31407296282c",
      "labels": [],
      "stats": {
        "errors": 0,
        "warnings": 0,
        "seconds": 0.692
      },
      "warnings": [],
      "errors": []
    }
  ]
}

I get the same output if I download the files first (attached here to freeze them) & follow the required redirect.

local-data-redirected.csv local-schema.json

I am very puzzled. Am it using the CLI incorrectly? Is the schema too permissive for some reason I'm missing?

Thanks for any hint!

AntoineAugusti commented 5 months ago

Hi Thibaut,

It seems this is a similar issue reported here.

Frictionless does not identify that the URL points to a CSV.

Updated the command (argument order) and added --format csv to make it work.

frictionless validate \
--schema https://schema.data.gouv.fr/schemas/etalab/schema-irve-dynamique/latest/schema-dynamique.json \
https://www.data.gouv.fr/fr/datasets/r/fde557ec-b96e-49a5-9282-31407296282c \
--json --format csv