sdg-csv-data-filler
The SDG CSV data filler is the first script in a pipeline to convert SDG data in CSV format, to CSVW format, which is a W3C Standard.
The script will become part of pipeline which may be integrated into the build scripts for the UK SDG site.
Later it may be integrated into the build scripts for the Open SDG platform, meaning that countries and cities which use the platform may choose to have the CSVW export function on their site.
Schematic diagrams of the the script
Overview of csvdata-filler and CSVW system
![csvdata-filler and CSVW system](https://github.com/jwestw/sdg-csv-data-filler/blob/master/img_for_readme/CSVW%20process%20overview%203.jpg?raw=true)
Overview of csv-data-filler functional processes
![csv-data-filler functional processes](https://github.com/jwestw/sdg-csv-data-filler/blob/master/img_for_readme/CSVW%20process%20overview%202.jpg?raw=true)
Functions of the script
The script functions as follows:
- It scrapes the UK SDG data repository of the SDG site for links to CSV files
- Downloads the CSV data from the URL.
- It checks settings in the overrides yaml file makes 3 different data transformations unique to any dataset and to each column as follows:
- If parameter 'fill_gaps' is True for the data set it will fill any gaps,
nan
, NaN
or Null
values with the gap filler value for that column
- If parameter 'fix_headers' is True it will standardise the headers by replacement. This is currently not used, but may need to be in the future. It is currently set to False
- if parameter 'standardise_cells'is True it will replace any non-standard values specified, and replace them with a standard value, e.g. it may replace 'male', 'Males' and 'M' with the standard value 'Male'.
- It outputs the transformed data in CSV format to a folder called "out"
To do:
- Code for a fix_headers function.
- Code unit tests for each function in modules.py
- Use a Python-github library to get data from github instead of scraping
Related projects
Licensing
SDG CSV data filler is under an MIT licence.