dfe-analytical-services / dfeR

Common R tasks in the Department for Education (DfE)
https://dfe-analytical-services.github.io/dfeR/
GNU General Public License v3.0
8 stars 2 forks source link

Add get_gias() function #38

Closed matt-dray closed 4 months ago

matt-dray commented 6 years ago

Background: Get Information About Schools (GIAS; formerly 'Edubase') is a published dataset containing school characteristics. We need frequently to join elements of this dataset to other datasets (e.g. exam results).

Problem: the typical workflow (at least for our team) is to physically navigate to the GIAS pages of GOV.UK, apply in-browser filters and download the subsequent CSV file. Alternatively, it's possible to download the whole dataset and filter it later. Joining GIAS to other datasets typically takes place in Excel and is unreproducible and prone to error.

Solution: write a function, e.g. get_gias(), that pulls the GIAS dataset into your workspace and cleans it, ready for matching within R. For my team, the latest cleaned GIAS dataset can uploaded to our team database to allow for matching to take place in SQL.

Notes

avisionh commented 6 years ago

This looks like a good idea @matt-dray!

Something to bear in mind in the future or now; if you have access to the Master Data Repository (MDR) or the RA_Datasets database in the VMT1PR-SQL1\SQLPROD server, then GIAS is already available. You can find it as: [Edubase].[vw_EdubaseMain_NS]

matt-dray commented 6 years ago

@avisionh Aha! I had no idea this existed. I'll message you separately.

peterrobertcurtis commented 6 years ago

I like the idea of having a function that I can call that brings in the current GIAS data. It doesn't matter to me whether this is through the API, the website or through SQL, although of course each pathway has it's own connection issues so I appreciate a packaged version may not be straightforward as it would have to work in conjunction with proxy setting, or odbc connections.

matt-dray commented 6 years ago

Yeah, from an R perspective I think it would be nice to type get_gias() and have it appear in your environment. I'm unable to find a table for the latest version of GIAS on any database I currently have access to (thanks for trying @avisionh!) and I'm pretty sure I'll develop that separately; this function will help put that into production.

adamrobinson361 commented 6 years ago

Great idea! We, as I’m sure most teams do, require latest published edubase cuts for various pieces of work so this would be really useful.

RE implementing it how are we going to deal with its address changing - the one at the moment looks a bit non official pointed at an azure site. Can we check with the devs if it is going to change soon ?

Also noticed it has a date parameter - can we get historical cuts?

matt-dray commented 6 years ago

I was developing this a little bit last week for our team so we can have the latest cut as a table in our database. I've now got access to GIAS's password-protected webservice, which will be great for me when getting the data for the database upload, but obviously not great for a generic function.

It's not 100% reliable of course, but there's been consistency in the URL in the date form: YYYYMMDD.

So something like:

gias_update <- data.table::fread(
  paste0(
    "http://ea-edubase-api-prod.azurewebsites.net/edubase/edubasealldata",
    stringr::str_replace_all(Sys.Date(), "-", ""),
    ".csv"
  )

Note that data.table will have to be a dependency. Or readr.

adamrobinson361 commented 6 years ago

Thanks Matt - what does the api offer on top of full dataset downloads? Would it be worth considering packaging that up instead?

matt-dray commented 6 years ago

I haven't looked properly into the documentation yet, but I think it's more for downloading bespoke cuts than the whole thing. Will look into it and check back.

matt-dray commented 5 years ago

Hullo, could someone unassign me from this issue?

peteryates commented 4 years ago

Hi, I did some work in a similar area (retrieving and making useful GIAS data) - it might be of interest.

https://github.com/DFE-Digital/gias-query-tool

cjrace commented 4 months ago

Closing this for now as it seems that the need might have been met through giasr, happy to reopen if there's anything more we should / could be doing in this package.