datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

DataHub validating booleans against wrong schema #239

Closed zaneselvans closed 6 months ago

zaneselvans commented 6 years ago

When uploading a tabular data resource containing boolean fields that have trueValues and falseValues set but that do not include the values True and False respectively, DataHub view generation fails, with errors of the variety:

ERROR :Failed to cast row: Field "FIELD_NAME_HERE" can't cast value "True" for type "boolean" with format "default"

It appears that the boolean values in the tabular data package have been converted internally during processing to True and False but that those values are being validated against or cast on the basis of the trueValues and falseValues attributes which are associated with the tabular data resource (which may not include True or False).

How to reproduce

Expected behavior

Validation and view generation should proceed normally, if the tabular data resource has appropriately defined trueValues and falseValues as they pertain to the packaged data, with casting and validation taking place on the basis of the valid values provided.

zaneselvans commented 6 years ago

It is possible to work around this issue by including True and False in the trueValues and falseValues arrays associated with the boolean fields, but this may mean introducing invalid metadata -- if True and False are not in fact valid within the packaged data. (e.g. if the data uses Y and N instead).

rufuspollock commented 6 years ago

@akariv do you have a sense of what is going on here?