GSS-Cogs / databaker

Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.
Other
1 stars 0 forks source link

Databaker - update datamarker handling #1

Open mikeAdamss opened 4 years ago

mikeAdamss commented 4 years ago

I thought databaker was supposed to do this (though it might only differentiate one or other). Should it? Do we need to add it? Or a warning of "mixed type" or somesuch?

Example here where the datamarkers (definetly -, possibly ..) have been left attached to the observation value.

example: https://ci.floop.org.uk/job/GSS_data/job/Trade/job/ONS-Quarterly-National-Accounts/lastSuccessfulBuild/artifact/datasets/ONS-Quarterly-National-Accounts/out/quarterly-national-accounts-gdp-data-tables-income-indicators.csv

from this task: https://github.com/GSS-Cogs/family-trade/issues/13

mikeAdamss commented 3 years ago

I've renamed this since I just got another example of this not working very well in a different scenario.

given obs cell values of 2,815 (estimate) and 2,460 (estimate), the datamarker logic is splitting as follows:

Value Marker
2.0 ,814 (estimate)
2.0 ,460 (estimate

needs a proper looking at, databaker code here: https://github.com/GSS-Cogs/databaker/blob/13929bb866e910290af6e55a36eb1080cb43d12e/databaker/jupybakeutils.py#L349-L356

example transform: https://github.com/GSS-Cogs/family-trade/blob/santhosh/issue67/datasets/HMRC-alcohol-bulletin/main.py