frictionlessdata / goodtables.io

Data validation as a service. Project retired, got to the current one at frictionsless/repository
https://goodtables.io
GNU Affero General Public License v3.0
69 stars 16 forks source link

Invalid CSV file intermittently being considered valid #295

Open vitorbaptista opened 6 years ago

vitorbaptista commented 6 years ago

If we keep enabling/disabling "Ignore blank rows" and/or "Ignore duplicate rows", some times an invalid CSV file is identified as valid. Seems like there is a race condition somewhere.

How to reproduce

  1. Go to https://try.goodtables.io/?source=https%3A%2F%2Fraw.githubusercontent.com%2Ffrictionlessdata%2Fgoodtables-py%2Fbc6470a970aacf65f20a3ddb7f71eb05a2a31c70%2Fdata%2Finvalid-on-structure.csv
  2. Click on "Validate"
    • You should see some validation errors
  3. Enable "Ignore blank rows" and/or "Ignore duplicate rows" and validate again
  4. If there's still some errors, go back to the previous step, changing the ignore options

Here's an example of a job that displayed the errors correctly:

https://try.goodtables.io/?source=https%3A%2F%2Fraw.githubusercontent.com%2Ffrictionlessdata%2Fgoodtables-py%2Fbc6470a970aacf65f20a3ddb7f71eb05a2a31c70%2Fdata%2Finvalid-on-structure.csv&apiJobId=45cedf3e-1706-11e8-9203-0242ac110008

And here another one that incorrectly tell that the data is valid:

https://try.goodtables.io/?source=https%3A%2F%2Fraw.githubusercontent.com%2Ffrictionlessdata%2Fgoodtables-py%2Fbc6470a970aacf65f20a3ddb7f71eb05a2a31c70%2Fdata%2Finvalid-on-structure.csv&apiJobId=be4a592c-1706-11e8-b944-0242ac110008