digital-preservation / csv-validator

CSV Validation Tool and API (CSV Schema RI)
http://digital-preservation.github.io/csv-validator
Mozilla Public License 2.0
205 stars 55 forks source link

Create "identical" test #67

Closed DavidUnderdown closed 8 years ago

DavidUnderdown commented 9 years ago

Some data should be the same in every row, in the standard TNA use case, this would be (for example) batch_code. It differs from batch to batch so cannot use an "is" test without updating schema for every batch, but we do know it should not change within the metadata received for a single batch. This is in a sense the converse of the "unique" test.

adamretter commented 9 years ago

Can you provide an example please?

DavidUnderdown commented 9 years ago

likely case is for batch_code. Each batch has a different batch code, so while you can do something like regex("^WO95Y14B[0-9]$") to ensure the batch_code is of the correct form, there is no way of checking currently that the value is consistent within in a batch, short of changing the schema for each batch.

We've recently had the case where we had rejected images from a batch, and when these were resupplied in a later batch, the supplier left the original batch_code in the csv file in those lines relating to the resubmitted images, while material in the same batch that was being supplied for the first time had the batch_code for that actual batch eg

lines with batch_code,....... WO95Y14B012,....... WO95Y14B012,....... WO95Y14B012,....... WO95Y14B002,....... WO95Y14B012,.......

and so on, whereas they should all be identical.

If we had the identical test (or whatever we choose to call it) we would have

batch_code: regex("^WO95Y14B[0-9]$") and identical

or perhaps it hsould be a column directive?

valydia commented 8 years ago

It has been implemented in the PR #86.

DavidUnderdown commented 8 years ago

Closed with 1.1 release, commit 104ee5e