ONS-Quarterly-National-Accounts

ajtucker commented 4 years ago

https://github.com/GSS-Cogs/family-trade/tree/master/datasets/ONS-Quarterly-National-Accounts

mikeAdamss commented 4 years ago

Didn't take CDID (we could with some work, but it was quite a lot of work and I didn't know if there was any point).

Extracted tables A-H as per the data suppliers breaksdowns (everything else they class as "other tables"), tons of rediculous layered column headers throughout so gave it a best shot guess.

grace-spitzer-wong commented 4 years ago

[x] Does the dataset show up or is there an error?
[x] Is it in the list of datasets from the main page?
[x] Is there descriptive metadata on the search page?
[ ] Does transformed info match the original?
[x] Does the tidydata download show all the data?
[ ] Note differences in column/row headers
[ ] Are there multiple cubes? List titles of cubes
[ ] What needs differentiating? Totals
[ ] Are there titles that need harmonising? World/Worldwide
[ ] Does the structure look sensible?
[ ] Does the hierarchy work?
[ ] What needs further investigation or context?
[ ] Any duplications?
[ ] List any detailed metadata/methodology to add retrospectively
[ ] Short metadata added

mikeAdamss commented 4 years ago

has an issue regarding data markings not being stripped from observations. Can fix manually for rework, but may (or may not) be a thing that should happen in databaker, so have raised an issue here: https://github.com/GSS-Cogs/databaker/issues/1 and added to backlog.

grace-spitzer-wong commented 4 years ago

Asked R for clarification around 4QR terminology and I'm looking for detailed economic info to link to.

grace-spitzer-wong commented 4 years ago

A quick write-up on the first four cubes:

Questions for further discussion @mikeAdamss - give me a shout if any of this needs clarifying as I'm popping a bunch of notes on here in case anyone else needs to pick it up.

All cubes

[x] Add data markers
[x] Please add CDIDs as attributes unless a new decision is made
[x] Add footnotes against concepts
[ ] Add tab letter to all cubes ie 'D Income' see below, list of cubes for the information you need
[x] All cubes change 1Q GR to Quarterly Growth Rate, 1YR GR to Yearly Growth Rate - hold off this one as waiting for info and definitions from @RobThomlinson
[ ] Add seasonally adjusted metadata to ref period dimension at top of column to cubes where relevant
[x] '-' and '..' Add data marker attributes- info for this methodology comes from the content page
[x] Add GDP per head - tab P Cube 3
[ ] Add Capital Good to Capital Formation - Capital formation / Capital goods, follow style of only first letter capitalised.
[ ] Add metadata to dimension for ref period to inc attribute of seasonally adjusted. Cube 5 E1-E4 Quarterly National Accounts, GDP – data tables: Household expenditure indicators
[ ] Add COICOP codes under 'Expenditure categories for individuals' cant add this, only partial codes exists (some are blank) so it'd cause all kinda of issues. Chatted with Alex and added as a separate task, relating to the sub/super setting of codelists etc -Mike- Cube 4 B1/B2 Quarterly National Accounts, GDP – data tables: Output indicators
[x] Add the 2016 Weights as a new dimension or attribute?

QA checklist:

[x] Does the dataset show up or is there an error? Yes for all cubes
[x] Is it in the list of datasets from the main page? Yes for all cubes
[x] Is there descriptive metadata on the search page? Yes for all cubes
[ ] Does transformed info match the original?
- [x] Cube 1
- [x] Cube 2
- [ ] Cube 3
- [ ] Cube 4
- [ ] Cube 5
- [ ] Cube 6
- [ ] Cube 7
- [ ] Cube 8
[ ] Does the tidydata download show all the data?
- [x] Cube 1
- [x] Cube 2
- [x] Cube 3
- [x] Cube 4
- [ ] Cube 5
- [ ] Cube 6
- [ ] Cube 7
- [ ] Cube 8
[ ] Note differences in column/row headers/info
- [x] Cube 1
ONS geography code for UK added.
Compensation of employees and Gross operating surplus of corporations have been listed under income category.
Income indicator lists the information originally nested under income category ie 'Wages and salaries', 'Employer's social contributions'.
- [x] Cube 2
Current prices and chained volume measures have been added as a Estimate type dimension as datasets match up
Footnotes are consistent in this cube for adding metadata
- [ ] Cube 3
Analysis by asset and Analysis by sector have been grouped under Analysis column
Each tab - chained volume prices and current prices come under Estimate Type
Percentage change not included.
Seasonally adjusted and all four quarters pulled into transform
- [ ] Cube 4
Category of output tab B1 and Tab B2 Service industry grouped under industrial sector - Category of output has been listed as 'Not Specified' for Tab B1 info.
- [ ] Cube 5
- [ ] Cube 6
- [ ] Cube 7
- [ ] Cube 8
[x] Are there multiple cubes? List titles of cubes
- Cube 1 Quarterly National Accounts, GDP – data tables: Income indicators (Tab D - Income)
- Cube 2 Quarterly National Accounts, GDP – data tables: Expenditure indicators (Tab C1-C2- Expenditure)
- Cube 3 Quarterly National Accounts, GDP – data tables: Gross fixed capital formation (Tab F1/F2- GFCF)
- Cube 4 Quarterly National Accounts, GDP – data tables: Output indicators (Tab B1/B2 - CVM output)
- Cube 5 Quarterly National Accounts, GDP – data tables: Household expenditure indicators (Tab E1-E4)
- Cube 6 Quarterly National Accounts, GDP – data tables: National Accounts Aggregates (Tab A1/A2 - Aggregates)
- Cube 7 Quarterly National Accounts, GDP – data tables: Inventories (Tabs G1/G2 Inventories)
- Cube 8 Quarterly National Accounts, GDP – data tables: Trade (Tab H1 -H2 Trade)
[ ] What needs differentiating? Totals
[ ] Are there titles that need harmonising? World/Worldwide
[ ] Does the structure look sensible?
[ ] Does the hierarchy work?
[ ] What needs further investigation or context?
CUBE 3 Quarterly National Accounts, GDP – data tables: Gross fixed capital formation Missing Intellectual property products and Total - pulling in Public corporation dwellings and private sector dwellings from Analysis by sector instead of just Analysis by asset. This issue may affect other cubes so will need to go back and re-check.
All cubes:- Percentage change data taken out - assume as it's derivative? If data is to be included - we need to add a new dimension of seasonally adjusted or percentage change, latest year on previous year rather than metadata to ref period dimension.
Where does it make logical sense to add Seasonally adjusted Percentage increase etc? Would this work as a new dimension?
[ ] Any duplications?
[ ] List any detailed metadata/methodology to add retrospectively
[ ] Short metadata added

Future work required:

[ ] Revision notification that will need to be added to metadata retrospectively: Revision notification https://docs.google.com/document/d/1sWSncabHO7tGRHaQzUljOMM2KPNs1pf0Kn0mr_dFeds/edit?usp=sharing

Decision:

Annexes and later tabs weren't added as lower case tabs tend to be supporting information (Slack thread DE 3rd Apr).

Notes:

CVM meaning: Chained volume measure - series of GDP stats adjusted for inflation to give a real measure of GDP.
CP - current price https://www.economicshelp.org/blog/7397/economics/gdp-at-chained-volume-measure/
Definitions info https://www.gov.uk/government/statistics/final-gdp-cp-and-cvm-quarterly-and-annual-estimates-1997-2013
COICOP - categories of expenditure by individuals - The Classification of Individual Consumption related to cube 5 Quarterly National Accounts, GDP – data tables: Household expenditure indicators.

JasonHowell commented 4 years ago

BA have reviewed one outstanding issue: series still shows "1Q GR" type columns rather than the correct "Quarter on Quarter" type values

MartynBSpooner commented 4 years ago

Need to summerise all the QA concerns into one list.

mikeAdamss commented 4 years ago

I've made the concrete changes I could on this one. Documented as best I can below, this is a complicated one and the "multiple reviewers and no final reviewer" approach hasn't worked particularly well, so happy to take any further steers.

Changes made:

markers added
CDIDs added
measure types tidied up, labels updated as per chat with Rob + common sense where needed
added GDP per head datacube
added footnotes to dimension descriptions
added description for chained volume measure.

Didn't do:

adding a seasonal adjustment column, think that's a constant and we haven't been adding it in those circumstances.
didn't add a reference to percentage change as Year on year , Quarter on quarter etc is the percentage change (if that's not clear enough with the new labelling we could maybe add a unit of measure column...if we really, really had to, the measure here are already complicated tbh).
COICOP, chatted with Alex and this will need to be a separate technical task as we're not really set up for handling these sorts of sub-codelist codelists yet.
didn't add revisions data as there's far too much to put in the comment/description fields we currently use. Open to ideas.
didn't add tab letters, the restructured data needs to make sense without reference to the old structure.

JasonHowell commented 4 years ago

Swirrl pulling through updates onto PMDv4.

grace-spitzer-wong commented 4 years ago

Measures values and declared measures type issue to be discussed before this can be closed. Potential problem with CDID code duplications.

rossbowen commented 3 years ago

When I run main.py from the commandline the script returns Killed.

mikeAdamss commented 3 years ago

@rossbowen - that'll be hitting a system resource threshold (it's a big one), python will kill the process if you go over certain system limits. You'll need to try shutting down everything you don't need (possibly worth restarting first as well) before running. It'll still take a while so one to kick off before going to lunch or somesuch.

mikeAdamss commented 3 years ago

moved this one back over, confirmed it runs to completion on my machine so its a laptop resource issue.

rossbowen commented 3 years ago

@mikeAdamss unsure how to approach this one. Looks like there's lots of .csv being output with differing structures.

I'm guessing each one will need its own info.json?

mikeAdamss commented 3 years ago

@rossbowen - missed this ping sorry.

iirc there's a lot going on here but most datacubes should have at least some dimensions in common, so I think its one column mapping/info.json (if it works like I think it does).

its an important output we've never gotten our heads around, so might be worth pairing up maybe? can make code tweaks as we squeeze some sense out of it.

LPerryman commented 3 years ago

Sorry to be a pain but this needs too much fiddling to get right, will be a complete mess in the end. Can we start over and just pull in each sheet as it is without adding anything and add each sheet/table to a list rather than output to a cube at the moment. so the first item in the list will be table A1, second A2, third B1 etc. etc. etc. i can then go through and see if things can be joined or output as it is.

LPerryman commented 3 years ago

Have published on PMD4 as multiple datasets but some periods are still showing as URIs (1948 to 1959). The periods have been picked up by the ref_periods pipeline and a periods codelist is being created when the quarterly national accounts pipeline runs but labels for some of the periods are not being created properly.

ajtucker commented 3 years ago

Transform looks to have broken, see https://ci.floop.org.uk/job/GSS_data/job/Trade/job/ONS-Quarterly-National-Accounts/149/console#:~:text=----%3E%209%20e1%5B'coicop'%5D%20%3D%20'cp'%20%2B%20e1%5B'coicop'%5D.astype(str)

JasonHowell commented 3 years ago

This has been published and checked. Closing issue as gsscogs-bot issues will be dealt with separately.

GSS-Cogs / family-trade

ONS-Quarterly-National-Accounts #13

[ ] Short metadata added