Closed zelima closed 6 years ago
FIXED. Closing this as most of the job is done! all major features are implemented and live. Automation of datasets itself occurred to be much harder accomplish and needs more time and analysis. Follow up it here #85
Dataset automation - Dec 2017
As a user, I want to un-pivot and normalize my remote data, so that I can package it easily and maybe create graphs for it
As a user, I want to remove the column that I do not want to be in DataHub, so that I can represent data that relates to my needs.
As a user, I want to add new column that is a product of other 2 or more columns, so that I can calculate Eg total number of money spent in country
As a user, I want to remove the last row from excel file, so it will be valid tabular data.
As a user I want my data to be clean. I want to find and replace specific string(s) with values I want, so I can build a graph. For example, replace
2017-q2
with2017-04-01
As a user, I have a compressed remote data that I want to publish on datahub without downloading and decompression it
Acceptance Criteria
70% of given datasets are automated (see analysis)Tasks
major new features needed:
Remove first rows (11 datasets) - supported by tabulator-pyremove an empty column- tabulator-py (For now this is not needed)Find/replace(3 datasets) - https://github.com/AcckiyGerman/tabulator-py/issues/2Analysis
@acckiygerman seems zip issue is fixed, at least I could automate dataset with zip source @zelima ?