GSS-Cogs / family-covid-19

0 stars 0 forks source link

ONS-Coronavirus-and-anxiety-estimates #135

Open ajtucker opened 4 years ago

LPerryman commented 4 years ago

Comments from Slack: 5th June 2020 Vamshi Hi @LPerryman please check https://github.com/GSS-Cogs/family-covid-19/blob/master/datasets/ONS-Coronavirus-and-anxiety-estimates/info.json , i couldn't see any changes in data this week, but today i getting different data set when i pull into jupyter it was new data set since this morning!

LPerryman
Hi Vamshi. this is the first tie i have seen this datasets. I see the new version has completely changed from last week. The original uses output from a binary logistic regression model, which, i think, is the first time we have have seen this. the new version has a lot less data and some things that need explanation (OPN Lite). the new output is from a model but i don't think it will be clear in the metadata, it is going to need a link to the main report, https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/articles/coronavirusandanxietygreatbritain/latest. I think we should put this on hold for a moment and ask the BAs to go back to the producers to find out about future publication formats.

LPerryman @RobT can this be investigated further please, i.e possible format changes and adding some metadata to the landing page about methodology? this wasnt just a small change but a whole reformatting

LPerryman
ok, so OPN is the new Opinions and Lifestyle Survey

Vamshi So here i confused i couldn't pull same data set each time, it varies. Yes, i will park this data for time being!

LPerryman yep

RobT
There was an error in the publication of this data. Just having some lunch, will come back with more information afterwards

RobT
OK, the back story to this one. When it was first identified we found 2 datasets - the estimates and the regression analysis. The data producer had mistakenly published the regression analysis XLS file into both. We spoke with the producer and they subsequently replaced the estimates XLS with the correct version. Regulations mean they have to leave the superseded version on the page under "previous versions". So I guess this was first looked at between 15 and 17 June, prior to the correct version being loaded.

RobT The information about the Opinions and Lifestyles Survey (OPN) and how that was used in this OPN Lite based output is all in the bulletin, so as with all PMD Datasets, having a link back to teh Landing Page allows users to get back to the source of the data and the associated Bulletin

VTula2000 commented 4 years ago

Need further investigation on cube definitions for this survey As survey outcome of Average Value, upper CI and lower CI defined with colour coded co-efficient of variation. And also data includes sample size with various options of Never | Hardly Ever | Ocassionally | Some of the time | Often/Always - this part may need to be separated from other observations to get clear cube

LPerryman commented 4 years ago

Sounds like it might need a proper spec, will probably discuss in morning meeting.

LPerryman commented 4 years ago

Vamshi In Jupyter I was getting OPN Lite {1 to 7} data in file till yesterday but not today, today i am getting regression data set with estimate url link ???

RobT
It doesn't specifically refference OPN Lite, but it does talk about Pooling data from the Opinions and Lifestyle Survey Statistics in this release have been taken from five waves of the Opinions and Lifestyle Survey (OPN), a monthly omnibus survey. In response to the coronavirus (COVID-19) pandemic, we have adapted the OPN to become a weekly survey used to collect data on the impact of the coronavirus on day-to-day life in Great Britain.

RobT
So refers to it not being standard OPN output. Not exactly a brilliant way to go, but I guess they considered it enough that the "Lite" tag would be understood as not being the full normal OPN

LPerryman
they have also used colour coding to express the co-efficient of variation adding another nightmare element to covid-19

RobT That is contrary to the ONS Style Guides, but some people love it too much to let go; the information just gets lost if it is printed or copied in black and white. I'm guessing we can't pull that info out of the spreadsheet, so we can't handle that information. The choice is do we ignore the CV and keep on with the rest of the information, or drop this dataset for now and look at others. My vote would be the latter (and I'll go back to the producer and challenge them to do it better next time)

LPerryman agreed

LPerryman commented 4 years ago

Stage 2 spec completed. Sample Size data ignored for now Co-efficient of variation (CV) has not been considered as it is colour coded and would need manual processing for each new release