Data-Liberation-Front / csvlint.rb

The gem behind http://csvlint.io
MIT License
287 stars 88 forks source link

Improve validation of URIs #60

Open ldodds opened 10 years ago

pezholio commented 10 years ago

https://github.com/sporkmonger/addressable looks promising

Floppy commented 10 years ago

For https://github.com/theodi/shared/issues/160

ldodds commented 10 years ago

The goal here was to try and improve the validation around URIs.

Currently the code use URI.parse. This will catch some errors but also lets through some values which probably shouldn't be treated as a URI. For example it parses any string as a valid relative URI. Looking again at the definition of xsd:anyURI that might be fine.

We also check to see whether its a http or https URI. This was an attempt to improve things, but may be overly limiting.

So the issue was to decide whether we wanted to keep what we are doing or improve things based on expected use cases for URIs in CSV data.

pezholio commented 10 years ago

Hmmm... Yeah, I see what you mean now. Looking at the spec, I think you're right, a xsd:anyURI defines a URI to be relative or absolute, so I think what we have is actually fine. We should probably get rid of the checking for http or https too.

pezholio commented 10 years ago

I've been giving this a bit more thought, and I think we should leave it as is. In most (if not all) instances, people are going to be using absolute URIs, and if we change it so it includes relative URIs, it'll match pretty much everything as a URI, which will mean a CSV with columns that are mainly URIs, but with the odd line of (unspaced) text will validate correctly.