datahub-v2 / frontend

DataHub frontend
https://datahub.io
MIT License
32 stars 11 forks source link

Some validation errors are not being identified #191

Closed Mikanebu closed 6 years ago

Mikanebu commented 6 years ago

Reported here: datahq/datahub-qa#68

We want to improve showcase page, so it should have error messages as per readme of the dp, eg, when pushing following datapackages some errors are not catched in the frontend:

https://github.com/frictionlessdata/test-data/tree/master/packages/types-formats-and-constraints

Acceptance criteria

the dataset page should have error messages on showcase page as per readme of the dp:

Tasks

Analysis

How to reproduce

  1. data validate and remember errors
  2. data push and see errors on the dataset page
AcckiyGerman commented 6 years ago

WONTFIX: String validator accept http:/datahub.io uri

The difference between JS and Python versions of tableschema about validating URI is:

AND

When I goes deeper I realized that there is an ANCIENT & GREAT HOLY WAR called: Accept or not Accept an URI with less or more than two // separators:

https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/ The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine. RFC 3986 and many others would disagree.

Also, most of browsers accept http:/datahub.io uri - I just have checked.

So the pypi implementation of rfc3986 DOES accepts this kind of invalid URI; And I doubt that @roll will accept any PR where we will implement URI validator by ourselves, just for fixing this particular edge case.

Also, the URI validating REGEX is so f**king complex:

/
# protocol user host-ip port path path path querystring fragment
^
#protocol
(?:(?<scheme>[a-zA-Z][a-zA-Z\d+-.]*):)?
(?:
  (?:
    (?:
        \/\/
        (?:
            #userinfo
            (?:((?:[a-zA-Z\d\-._~\!$&'()*+,;=%]*)(?::(?:[a-zA-Z\d\-._~\!$&'()*+,;=:%]*))?)@)?
            #host-ip
            ((?:[a-zA-Z\d-.%]+)|(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(?:\[(?:[a-fA-F\d.:]+)\]))?
            #port
            (?::(\d*))?
        )
    )
    #slash-path
    (
        (?:\/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*
    )
  )
 #slash-path
 |(\/(?:(?:[a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(?:\/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*))?)
 #path
 |([a-zA-Z\d\-._~\!$&'()*+,;=:@%]+(?:\/[a-zA-Z\d\-._~\!$&'()*+,;=:@%]*)*)
)?
#querystring
(?:\?([a-zA-Z\d\-._~\!$&'()*+,;=:@%\/?]*))?
#fragment
(?:\#([a-zA-Z\d\-._~\!$&'()*+,;=:@%\/?]*))?
$
/x

So I'm leaving this URI validator as it is.

AcckiyGerman commented 6 years ago

WONTFIX: tableschema-py validator accepts datetime 2018/01/02T00:00:00 format: Any

The same situation here - py and js versions of tableschema use different libs to validate datetime

This 2018/01/02T00:00:00 is a

So my opinion - we should not restrict tableschema-py only because tableschema-js could not recognize some valid dates.

AcckiyGerman commented 6 years ago

FIXED: JS version of tableschema gives some validation errors, while PY version doesn't:

So, everything is good with our Pipeline.

And I'm going to change the corresponding QA test.