Closed pierrecamilleri closed 7 months ago
After investigating :
field_info
is constructed from the resource.schema.fields
(here). In the case of schema-sync, this property is mutated, the fields are reordered according to the order of the labels in the table, and then, the required fields are added to the end if missing from the labels (here).field_info["mapping"]
)Naively removing missing columns from field_info
at creation breaks approx. 60 tests. We'll experiment to do the same down the road, see if we can make it work, e.g. in the __process
method.
As a side note and as feedback from our exploration, the design choice to loop on field_info
for the validation, which is directly derived from resource.schema.fields
, which in turn can be mutated during the process (at least for schema_sync = true
) was a bit unsettling to us (we would have expected to loop on the table labels instead, as columns may be missing, and to find in resource.schema.fields
the schema fields similar to what is in the schema).
Overview
In the process of migration from v4 to v5 in validata, we experienced some incorrect errors in the case of a missing required column.
Here is some python code to reproduce :
Output :
Observed behavior
There are three errors among which :
Constraint Error
that suggests that a fieldA
has a missing value (None
) although there is no column A at all.Missing Cell
error that suggests that input data is malformed, although input data is perfectly fine.Expected behavior
I would expect to only get the first
missing-label
error.Other details and experimentations
Frictionless version 5.16.0
Same result with command line validation. I have put "schema-sync" to reproduce more closely our use case, but it does not seem to be related with the actual issue.
Inspecting "row" on "validator/validator.py", l151 :
returns an artificially added A property :