Data-Liberation-Front / csvlint.rb

The gem behind http://csvlint.io
MIT License
283 stars 86 forks source link

Validation failing when some columns are not named and others are #247

Open josepajay opened 3 years ago

josepajay commented 3 years ago

Expected Behaviour

What should happen?

Columns which are used to define primaryKey, foreignKeys & rowTitles in schema should validate the correct column.

Current Behaviour

If we define a column with no name before a column with a name and we use the named column as the primaryKey then validation fails as the unique constraint is applied to the wrong column. The same could happen if we use the column as part of a foreignKey or rowTitle definition.

Steps to Reproduce

csvlint -s reproduce_bug.csv-metadata.json

csv: reproduce_bug.csv

metadata: (Pasting here since the github does not support this file)

{
  "@context": [
    "http://www.w3.org/ns/csvw",
    {
      "@language": "en"
    }
  ],
  "tables": [{
    "url": "reproduce_bug.csv",
    "tableSchema": {
    "columns": [{
            "titles": "A",
            "datatype": "string"
        }, {
            "name": "b",
            "titles": "B",
            "datatype": "string"
        }],
        "primaryKey": ["b" ]
    }
  }]
}

Result

csvlint -s ~/Desktop/reproduce_bug.csv-metadata.json
Resolving dependencies...
..!!!
/Users/ajayjoseph/Desktop/reproduce_bug.csv is INVALID
1. duplicate_key. Row: 3,1. I am not a primary key
2. duplicate_key. Row: 4,1. I am not a primary key
3. duplicate_key. Row: 5,1. I am not a primary key

I think the code here is the problem. Columns has all the columns in it, but the column_names only gets populated only if the name is present. So if you have a column without a name before a column with name, these will get out of alignment. These 2 variables are arrays. So this means that you can't look up a column's index in one and expect it to be in the same place in the other array.