datahubio / datahub-v2-pm

Project management (issues only)
8 stars 2 forks source link

[Epic] Data validation #63

Closed zelima closed 6 years ago

zelima commented 6 years ago

As a Publisher when I push a dataset (either new or already exists) I get a page that shows me the processing status and a preview (as much as possible) straight away

... and

As a Publisher I want to push a dataset, see the error, then fix it and repush with that repush being to the same dataset i.e. without creating a new dataset (and without reuploading data! ) so that the obvious things happen (i was reuploading the same dataset!) and i don't use more of my storage

As a Publisher I want a failed push not to "break" the existing working dataset page so that my consumers don't suddenly get a broken dataset if i make a mistake

As a Consumer I want a permanent url for a given revision of a dataset so that I can always get the data for that dataset

As a Publisher I want to know if data validation failed and details of what was wrong and how to fix it so that I can quickly correct the errors

Acceptance Criteria

Tasks

Analysis

Collect errors for pipelines Eg:

{
  "pipelines": {
      "<pipeline-id>": {
          "title": "Create a ZIP file",
          "status": "FAILED",
          "stats": null,
          "error_log": [
            "error line 1", ...
          ]
      },
      "<pipeline-id-2>": {
          "title": "Validate package contents",
          "status": "SUCCESS",
          "stats": { 
            "bytes": 1234,
            ...
          },
          "error_log": []
      }
  }
}
zelima commented 6 years ago

Mostly FIXED small bug in pipeline statuses. moving it as separate issue together with blog post