GSS-Cogs / family-trade

1 stars 2 forks source link

ONS-Quarterly-National-Accounts #13

Closed ajtucker closed 3 years ago

ajtucker commented 4 years ago

https://github.com/GSS-Cogs/family-trade/tree/master/datasets/ONS-Quarterly-National-Accounts

mikeAdamss commented 4 years ago

Didn't take CDID (we could with some work, but it was quite a lot of work and I didn't know if there was any point).

Extracted tables A-H as per the data suppliers breaksdowns (everything else they class as "other tables"), tons of rediculous layered column headers throughout so gave it a best shot guess.

grace-spitzer-wong commented 4 years ago
mikeAdamss commented 4 years ago

has an issue regarding data markings not being stripped from observations. Can fix manually for rework, but may (or may not) be a thing that should happen in databaker, so have raised an issue here: https://github.com/GSS-Cogs/databaker/issues/1 and added to backlog.

grace-spitzer-wong commented 4 years ago

Asked R for clarification around 4QR terminology and I'm looking for detailed economic info to link to.

grace-spitzer-wong commented 4 years ago

A quick write-up on the first four cubes:

Questions for further discussion @mikeAdamss - give me a shout if any of this needs clarifying as I'm popping a bunch of notes on here in case anyone else needs to pick it up.

All cubes

QA checklist:

Future work required:

Decision:

Notes:

JasonHowell commented 4 years ago

BA have reviewed one outstanding issue: series still shows "1Q GR" type columns rather than the correct "Quarter on Quarter" type values

MartynBSpooner commented 4 years ago

Need to summerise all the QA concerns into one list.

mikeAdamss commented 4 years ago

I've made the concrete changes I could on this one. Documented as best I can below, this is a complicated one and the "multiple reviewers and no final reviewer" approach hasn't worked particularly well, so happy to take any further steers.

Changes made:

Didn't do:

JasonHowell commented 4 years ago

Swirrl pulling through updates onto PMDv4.

grace-spitzer-wong commented 4 years ago

Measures values and declared measures type issue to be discussed before this can be closed. Potential problem with CDID code duplications.

rossbowen commented 3 years ago

When I run main.py from the commandline the script returns Killed.

mikeAdamss commented 3 years ago

@rossbowen - that'll be hitting a system resource threshold (it's a big one), python will kill the process if you go over certain system limits. You'll need to try shutting down everything you don't need (possibly worth restarting first as well) before running. It'll still take a while so one to kick off before going to lunch or somesuch.

mikeAdamss commented 3 years ago

moved this one back over, confirmed it runs to completion on my machine so its a laptop resource issue.

rossbowen commented 3 years ago

@mikeAdamss unsure how to approach this one. Looks like there's lots of .csv being output with differing structures.

I'm guessing each one will need its own info.json?

mikeAdamss commented 3 years ago

@rossbowen - missed this ping sorry.

iirc there's a lot going on here but most datacubes should have at least some dimensions in common, so I think its one column mapping/info.json (if it works like I think it does).

its an important output we've never gotten our heads around, so might be worth pairing up maybe? can make code tweaks as we squeeze some sense out of it.

LPerryman commented 3 years ago

Sorry to be a pain but this needs too much fiddling to get right, will be a complete mess in the end. Can we start over and just pull in each sheet as it is without adding anything and add each sheet/table to a list rather than output to a cube at the moment. so the first item in the list will be table A1, second A2, third B1 etc. etc. etc. i can then go through and see if things can be joined or output as it is.

LPerryman commented 3 years ago

Have published on PMD4 as multiple datasets but some periods are still showing as URIs (1948 to 1959). The periods have been picked up by the ref_periods pipeline and a periods codelist is being created when the quarterly national accounts pipeline runs but labels for some of the periods are not being created properly.

ajtucker commented 3 years ago

Transform looks to have broken, see https://ci.floop.org.uk/job/GSS_data/job/Trade/job/ONS-Quarterly-National-Accounts/149/console#:~:text=----%3E%209%20e1%5B'coicop'%5D%20%3D%20'cp'%20%2B%20e1%5B'coicop'%5D.astype(str)

JasonHowell commented 3 years ago

This has been published and checked. Closing issue as gsscogs-bot issues will be dealt with separately.