As a Publisher I want to upload / store / import data / publish data on the DataHub without installing an app so that I can test it out and start publishing data
Specifically so that:
I can get my data online quickly and easily
I can check my data along the way and correct it
As a Publisher I want to validate my data prior to publishing so I know that my data is high quality.
As a Publisher I want to be able to transform my data so it is exactly how I want it to be.
As a Publisher I want to infer and edit the types of my data so that it is exactly as I want it to be
I can try it out (would this service be useful to me?)
I can discover some of the features
I can quickly create a derivative of existing data
I can add a visualization ...
As a Publisher I want to add a visualization of my data so that it looks good
Summary:
I want to be guided through import
Move fast
Get things right
I want to have something working and live at the end of it
I want to do a trial that works in 30s
User flow (under discussion):
Select source
File upload
URL
DataHub
Paste table (from a spreadsheet)
Import
Preview and row count
Check
OK => contine
Not ok: Need to correct => wrangle
Wrangle [optional]
filter etc
Validate [optional]
Implicitly guess schema
Publish ...
Journey through site (under discussion)
A visitor arrives and they click on "Get started" => /publish
Upload a file (stores in bitstore ...)
(or use one of our examples)
--> /import - publish-cont (temp name)
Other options
npm install cli
Download all in one
Desktop app
/import
Wrangle - as per above
Publish:
For:
multiple formats
visualisations
sharing
collabortion
...
(Sign up => login now (and we'll keep your place here => use child windows etc))
Acceptance criteria
[ ] I can upload and publish my data file under my account using only website UI
[ ] I can validate my data prior to publishing
[ ] I can transform my data prior to publishing
Tasks
Phase 0 - boot repo and stub (3)
[x] Create repo data-import-ui (0.5)
[x] Setup the project (webpack, vue) (2)
[x] Decide on using react or angular or vue => use react or vue @anu - i would take a quick look at vue here (it looks so much nicer) - using vuejs ~anu
[x] Use typescript or not => not
[ ] Stubbed index page live at datahq.github.io/data-import-ui (0.5)
Phase 1 - UI and core classes and file selector (19)
[x] PipelineDescriptor + PipelineRunner (3h)
[ ] Tests ...
[x] Stubbed UI (1h)
[x] CSS (if needed)
[ ] React base class
[x] Implement file loader. Start just with url - and an option to fill with default example (5h)
[x] UI: Source step: URL selector
[ ] Bindings and events (via redux?)
[x] preview data (with sampling: first 10 rows, then every tenth etc) (5h)
[x] Stub the pipeline runner to send back fake events ...
[ ] Renderer (?)
Jump to phase 4 (?) - probably not
Phase 2
[ ] implement validation
[ ] preview of validation report
[ ] infer and edit the types of my data
Phase 3
[ ] implement transforms
[ ] preview of transformed data
Phase 4
[ ] implement publish - upload flow spec to specstore
Phase 5 - Integrate into frontend
[ ] Complete analysis
[ ] Create /publish page and do upload (?)
[ ] ...
Analysis
Questions and answers
2 choices
Run entirely in browser and then hand off flow to dpp
Run in browser with live interaction with dpp backend
Questions
implement in js
run against python
Questions
row or field cards => choose row
Do we consider future potential uses in cli and desktop - maybe not for now but keep in mind
Comments: Package centric (rather than file centric)
Lessons from previous efforts:
start with a file (rather than a package - come to multiple files later)
people understand the table metaphor (not the cards for fields metaphor aka goodtables-ui)
Outline of code
var pipelineDescriptor = [
{
// step
}
]
var pipelineEvents = [
{
uuid: step id / 'global' in case of general error
e: event type ('start', 'done', 'err', 'rs' (schema), 'r' (row), 've' (validation error)),
idx: row index (where applicable 'r', 've' events)
idx: row index (where applicable 'r', 've' events)
data: row data (where applicable)
schema: schema (where applicable: 'rs' event)
msg: error message
field: failed field (where applicable, 've' event)
}
]
class UIState / UIWizardState
class DataHubApi
class DataPackagePipelineRunner encapsulates the pipeline system
// misc extra things you want
TableViewer
StepEditor
// get a pipeline from the UI
// probably in react this is stored in redux
listenToUi(pipelineDescriptor {
var dpp = dataPackagePipeline()
dpp.configure(pipeline) -> prepared-pipeline-id
dpp.run(prepared-pipeline-id, callback)
})
---
// pipeline object
Array<StepModel>
StepModel: {
verb: string,
uuid: string,
options?: any
};
verbs:
- `source`
- options: `url`
- `skip`
- options: `kind` (rows/columns), `amount` (number of kind to skip)
- `headers`
- take the current first row and treat it as the header row
- `mutate`
- options: `kind` (schema/validate), `field` (to mutate), `options` (to set in schema)
- `filter`
-
dpp callback is called on:
- status (pending/started/progress/done)
- result of a step
- errors
- validation errors
callback data object
{
uuid: step id / 'global' in case of general error
e: event type ('start', 'done', 'err', 'rs' (schema), 'r' (row), 've' (validation error)),
idx: row index (where applicable 'r', 've' events)
idx: row index (where applicable 'r', 've' events)
data: row data (where applicable)
schema: schema (where applicable: 'rs' event)
msg: error message
field: failed field (where applicable, 've' event)
}
As a Publisher I want to upload / store / import data / publish data on the DataHub without installing an app so that I can test it out and start publishing data
Specifically so that:
Summary:
User flow (under discussion):
Journey through site (under discussion)
npm install cli
Acceptance criteria
Tasks
Phase 0 - boot repo and stub (3)
Phase 1 - UI and core classes and file selector (19)
Jump to phase 4 (?) - probably not
Phase 2
Phase 3
Phase 4
Phase 5 - Integrate into frontend
Analysis
Questions and answers
Questions
Existing work and lessons from it
Existing work:
Lessons from previous efforts:
Outline of code
https://github.com/akariv/dpp.ui-client/blob/master/src/app/server-events.service.ts https://github.com/akariv/dpp.ui-client/blob/master/src/app/step-model.ts
Anu's old analysis
System:
Components:
Redux:
Validation:
Transforms: