ScotGovAnalysis / opendatascot

An R package to pull data from statistics.gov.scot into R
https://scotgovanalysis.github.io/opendatascot/
MIT License
47 stars 6 forks source link

Information for refPeriod uninformative for some datasets #101

Closed dc27 closed 1 year ago

dc27 commented 3 years ago

For Healthy Life Expectancy open data:

ods_structure("healthy-life-expectancy")$categories$refPeriod

yields:

"P3Y" "P3Y" "P3Y"

expected:

"2015-2017" "2016-2018" "2017-2019"

as on https://statistics.gov.scot/

For the example -

ods_structure("homelessness-applications")$categories$refPeriod

yields the expected result:

[1] "2007-2008" "2008-2009" "2009-2010" "2010-2011" "2011-2012" "2012-2013" "2013-2014" "2014-2015" "2015-2016" [10] "2016-2017" "2017-2018" "2018-2019" "2019-2020"

More datasets also contain "PxY" instead of the actual refPeriod - e.g. Life-Expectancy

The resource is excellent and I look forward to using it more.

GordonBryden commented 3 years ago

This is probably some unintended behaviour from SPARQL's automatic date parsing. I may need to undo that feature in the query builder.

GordonBryden commented 3 years ago

I tested this again, and it seems to be working without any code changes on this end. I guess there has been some server side changes which have improved the function. For now I'll mark this as fixed but I'll keep an eye out for a reoccurrence.

dc27 commented 3 years ago

It's now working for the ods_structure() method:

ods_structure("healthy-life-expectancy")$categories$refPeriod yields the correct (expected)

[1] "2015-2017" "2016-2018" "2017-2019"

but

unique(ods_dataset("healthy-life-expectancy")$refPeriod) still yields the previous:

[1] "P3Y"

GordonBryden commented 3 years ago

OK, I'll reopen until I can work this out

mairiskye commented 2 years ago

Hi there Gordon,

I'm also encountering the same issue. As per original post:
ods_structure("educational-attainment-of-school-leavers")$categories$refPeriod yields the correct years

[1] "14" "15" "16" "17" "18" "19"

But ods_dataset("educational-attainment-of-school-leavers") consistently returns "P3Y" for all records under refPeriod.

Do you have a work around for this?

Many thanks!

GordonBryden commented 2 years ago

Hi,

It has taken some digging, but it looks like this is due to the hack SWIRL implimented to allow interval data in refPeriod. Instead of the value data that I had been expecting, an interval will be returned as another triplicate. That's what shows as "P3Y" in the returned data. I should be able to hack in a solution to this by having refPeriod returned as the labelled value, which will return the expected value of the date for both regular dates and interval date. I'll update when this is implimented.

GordonBryden commented 2 years ago

I believe I've fixed this. I still need to test that this hasn't broken any other features, but you can download the new version below: devtools::install_github("datasciencescotland/opendatascot", ref = "refPeriod-fix")

GordonBryden commented 1 year ago

I've approved the pull-request - this should now be solved