SocialFinanceDigitalLabs / sf-fons-platform

https://github.com/SocialFinanceDigitalLabs/sf-fons
1 stars 0 forks source link

Load publicly available ONS/Ofsted files onto data platform to be accessible by pipeline code #47

Open dotloadmovie opened 6 months ago

dotloadmovie commented 6 months ago

Business Case:

For the Dynamic Sufficiency tool produced for London, Commissioning Alliance hosted the Power BI tool and built the data model for it as an Azure data pipeline. This data model takes data from three sources:

For the East of England version of Dynamic Sufficiency, Hertfordshire will host the Power BI tool. For this, they will need the data modelling to have been performed on the data platform, which requires:

a) the publicly available ONS data to be imported to the data platform b) a publicly available version of the Ofsted data to be imported to the data platform c) the data modelling itself (that results in fact and dim tables) to be written at the end of the sufficiency-output pipeline code

Additional benefits to achieving all of this on the data platform are:

This ticket relates to a) and b) the addition of publicly available ONS data and Ofsted data to the data platform

Problem Statement:

In order to develop a full data model for Dynamic Sufficiency, we need the datasets that we are currently missing. Two of these are publicly available ONS tables and the other is the annual Ofsted file for providers. These need to be imported to the platform and made available to the SSDA903 pipeline code. Some data transformation steps should be conducted on the postcode directory file before they are incorporated into the 903 pipeline code, as they are cumbersome and should be run only when the input file is updated, rather than every time the 903 pipeline runs.

Data Sets In Scope:

Within the Ofsted file, the tab that needs to be saved is "Provider_level_at_31_Aug_2023"

Use Case(s):

1 Social Finance

1. East of England

I need this so that I may;

5. IG Considerations:

Does the current IG cover this? - Yes

Other IG notes and/or actions:

6. Technical Proposal:

Steps:

Estimated cost to deliver:

Development time: X Sprints (Y weeks), Z Developers (agile deployment of different Developers as skills are needed)

END

dotloadmovie commented 6 months ago

I believe this is a relatively simple task - we need to run as a separate pipeline with Michael's additional processing code at the core, wrapped in something to cURL the dataset from the static URL provided by the public providers.

dotloadmovie commented 6 months ago

Branch created - secondary cURL functionality will be scripted here

MagicMiranda commented 6 months ago

Will run into next sprint but simple task. Dave comfortable with task. Connect to external data set and making it available. Will talk to MH very soon. All happening at FE of solution, no security issues.

Is this step 1 of many or 1 and done? Good for any external data set in the future. Will be part of the EV effort. Instance determined etc... for now DT is responsible for timings and frequency for pulling new files in.

MagicMiranda commented 6 months ago

All 3 files publicly available and tickets have been merged. File refreshes will be logged in the log file. Will look to see if expected file is there. These files are updated very infrequently. To be discussed MH and DT. DR effort required when time and resource allow.