Closed ghost closed 6 years ago
It's implemented on the Table
level (test example):
Could you provide a data example where it doesn't work?
Using the library via Data Curator v0.8.0 I get...
Could you share the descriptor created by Data Curator?
const data = [
['a', 'b', 'c'],
['1', '2', '3'],
['1', '3', '2'],
['2', '4', '1'],
]
const schema = {
fields: [
{name: 'a', type: 'integer', constraints: {unique: true}},
{name: 'b', type: 'integer'},
{name: 'c', type: 'integer'},
]
}
const table = await Table.load(data, {schema})
await table.read()
$ node tmp/unique.js
TableSchemaError: Row 3 has an unique constraint violation in column "a"
Sorry got to run for the bus but here's an ugly version
{"profile":"tabular-data-package","resources":[{"profile":"tabular-data-resource","encoding":"utf-8","schema":{"fields":[{"name":"a","type":"integer","format":"default","constraints":{"unique":true}},{"name":"b","type":"integer","format":"default"},{"name":"c","type":"integer","format":"default"}],"missingValues":[""]},"format":"csv","mediatype":"text/csv","sources":[{"title":"","path":"","email":""}],"name":"test","path":"data/test.csv"}],"sources":[{"title":"","path":"","email":""}],"name":"test"}
I can't reproduce:
const descriptor = {
"profile": "tabular-data-package",
"resources": [
{
"profile": "tabular-data-resource",
"encoding": "utf-8",
"schema": {
"fields": [
{
"name": "a",
"type": "integer",
"format": "default",
"constraints": {
"unique": true
}
},
{
"name": "b",
"type": "integer",
"format": "default"
},
{
"name": "c",
"type": "integer",
"format": "default"
}
],
"missingValues": [
""
]
},
"format": "csv",
"mediatype": "text/csv",
"name": "test",
"path": "tmp/unique.csv"
}
],
"name": "test"
}
const dp = await Package.load(descriptor, {basePath: '.'})
await dp.getResource('test').read()
$ node tmp/unique.js
TableSchemaError: Row 3 has an unique constraint violation in column "a"
@mattredbox can you look on the Data Curator side to make sure where not dropping the returned error?
Thanks @roll
Hi @Stephen-Gates and @roll
I noticed in your example @roll from the unit test that you use the function:
await table.read()
Similarly when I use this in the Data-Curator, I can see the unique contraint violation method.
However, in order to interrogate errors in rows and build error message for each, following your doco, we're creating a stream iterator:
const stream = await table.iter({extended: true, stream: true, cast: false})
and then using the castRow function to interrogate each row:
schema.castRow(row)
Can you confirm whether or not we should be able to get unique contraint violation from this (as we can get for example other constraints such as minimum value, required) - as perhaps we're not using these methods correctly, or why unique constraints violation shouldn't be available to us here?
@mattRedBox
I see - that's the gotcha schema.castRow(row)
do know nothing about previously cast rows. So there is only one constrain which exists only on Table
level - unique
constraint.
We need to think how to resolve it. In high-level here is a right way - https://github.com/frictionlessdata/tableschema-js/issues/112 - but may be there is a quick fix.
I am not sure this is a gotcha exactly - unique validation requires reading the table to check uniqueness.
@roll I guess one kind of quick option would be to allow some configuration where UniqueFieldsCache
can be a global variable, with the gotcha that the user of the library needs to manage the state of this variable.
Thanks @roll and @pwalsh
Understood - and yep - all makes sense.
We first started using casting rows as when couldn't see how to collect all errors from table.read()
- I like the idea of something like a 'force parse' option 👍
Obviously, for us, we can't use both table.read()
and schema.castRow(row)
as while we could collect unique errors (as the messages seem to be consistent across both methods) - as a quick workaround for now, I'm not sure how to guarantee that we always get unique constraints, and not some other some other error, from table.read()
@pwalsh I thought about it a little bit, and for now, I don't see that trying to hack it somehow could be the best strategy. It seems the proper solution could take the same amount of time.
Here - https://github.com/frictionlessdata/tableschema-js/blob/master/src/table.js#L62-L168 - we just need:
table.iter:forceCast
flag forceCast: true
(I think it should be on the field.castValue
level)Error
object (with a list of error.errors
) instead of the row if there are cast errors. In this mode, a client should just check for returning row type. If it's not an array but an Error
it means there are cast errors in the row. To get the errors the client just should access error.errors
. And that's exactly what Data Curator needs.
tabulator
returns an empty row - https://github.com/frictionlessdata/tabulator-py#force-parse - but it's because there is no need to collect errors.
@mattRedBox
Would you like to try to PR this? It seems it's just moving a small chunk of the Data Curator code to the tableschema
lib.
thanks @roll yes, happy to. Just need to check with @Stephen-Gates first, where we fit this into current development
@mattRedBox Data Curator Issue #311 is in milestone v0.9.0
Validation for unique constraints doesn't appear to be implemented? Could this be added to upcoming release or is the validation performed elsewhere?