Data-Liberation-Front / csvlint.rb

The gem behind http://csvlint.io
MIT License
283 stars 86 forks source link

CSVW-based validation! #142

Closed JeniT closed 8 years ago

JeniT commented 9 years ago

There's a lot here, and it could do with some code review before merging please!

When the cucumber tests are first run, a script will create tests based on the latest version of the CSV on the Web test suite, including creating a local cache of the test files. This requires an internet connection and some patience while they download.

fixes #141

pezholio commented 9 years ago

Just a few minor(ish) style points, so far, but looking good so far :+1:

I am a bit worried that some of the classes are looking a bit big, this could probably be fixed by putting the constants in separate files. Happy to have a bit of a stab at a refactor if you want?

I've got two failing tests locally, so will try and fix those.

JeniT commented 9 years ago

@pezholio more than happy for you to do some refactoring!

pezholio commented 9 years ago

Hmmmm. We now have two failing tests (which were failing before the refactor too). Unless I'm misunderstanding something, the files have been munged somehow (probably at the W3C end). Test 122 says If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored., but there is a file referenced in tables:

{
  "@context": "http://www.w3.org/ns/csvw",
  "rdfs:label": "This metadata is used",
  "rdfs:comment": "If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored.",
  "tables": [{
    "url": "test122.csv",
    "tableSchema": {
      "columns": [{
        "titles": "countryCode"
      }, {
        "name": "latitude",
        "titles": "latitude",
        "datatype": "number"
      }, {
        "name": "longitude",
        "titles": "longitude",
        "datatype": "number"
      }, {
        "name": "name",
        "titles": "name",
        "datatype": "string"
      }]
    }
  }]
}

And a similar issue occurs in test123/csv-metadata:

{
  "@context": "http://www.w3.org/ns/csvw",
  "rdfs:label": "This metadata is used",
  "rdfs:comment": "If the metadata file found at this location does not explicitly include a reference to the requested tabular data file then it MUST be ignored.",
  "tables": [{
    "url": "action.csv",
    "tableSchema": {
      "columns": [{
        "titles": "countryCode"
      }, {
        "name": "latitude",
        "titles": "latitude",
        "datatype": "number"
      }, {
        "name": "longitude",
        "titles": "longitude",
        "datatype": "number"
      }, {
        "name": "name",
        "titles": "name",
        "datatype": "string"
      }]
    }
  }]
}

Any ideas @JeniT, or am I misunderstanding something?

pezholio commented 9 years ago

Right. OK, I've got this sorted now. Both of the failing tests said And there should not be warnings (but there should be) when the tests were passing, and this has now changed to And there should be warnings. I also noticed two lines that were commented out, which should have been generating schema warnings. I removed the comments and the tests pass as expected. Does this seem legit @JeniT? If so, then I'm happy to merge!

JeniT commented 9 years ago

@pezholio depends on whether the warnings that are generated by those tests match the warnings that are expected by those tests! Run the command line and look at the warnings. (I'll try to find some time over the weekend if you want to pass it on to me.)

pezholio commented 9 years ago

Oh yeah, sorry, I checked and they're both generating schema warnings as expected. Would be nice if we could test for warning types, but I understand that it might not be possible using this automated approach from the W3C test suite.

JeniT commented 9 years ago

Can you double check whether the warn_if_unsuccessful variable is needed any more? there were some inconsistencies about when warnings were generated which I think were fixed with changes to those tests, which I think might mean that warn_if_unsuccessful is now not a good way of monitoring when warnings need to occur.

pezholio commented 9 years ago

Just had a look, and relying on that variable means that we won't get a warning if the schema is gleaned using another method (which I guess is what the tests are looking for). In the case of test 122, we don't get a schema from the link header, but we do get a schema from guessing via that paths, so https://github.com/theodi/csvlint.rb/blob/feature-csvw-validation/lib/csvlint/validate.rb#L338 returns from the method (before we get to the code that builds the warnings).

I can get around this in one of two ways - either by removing the warn_if_successful variables and building the warnings in place as we are currently, or setting warn_if_successful and removing the explicit return when setting the schema

pezholio commented 8 years ago

I'm happy with this, so I'm going to merge! :+1: