Open demonno opened 5 years ago
Would be nice to be able to replace the WorldBank provider which is very specific with something like this that would generalize to other data sources, and I think a SDMX provider would fit nicely into FSharp.Data
bumping this issue; this would make it much easier to create data science examples since the amount of data provided has grown significantly since this was created. any implementation tips would be appreciated it
A prototype working implementation is in https://github.com/demonno/FSharp.Data fork. We'll try to finally create a pull request based on that work. There is support for SDMX protocol version 2.1. Some SDMX sources offer only SDMX 2.0 protocol and that part is still not yet implemented. The description on how the proposed solution works is described here: https://digikogu.taltech.ee/en/Item/47d2c178-2681-4aa5-9e25-23868a21c29b
@juhan no need to implement 2.0; sdmx 3.0 is being released this year as well. Most places will move to a more modern version shortly.
Since several SDMX standard-based data sources have emerged recently it would be useful to have a type provider supporting such data sources. The following exposes the current status of the effort of creating an SDMX TypeProvider. It is open to ideas and suggestions. I am very much looking forward to getting feedback from the
FSharp.Data
community to whether it would it be a good fit to have an SDMX type provider implementation inFSharp.Data
.There are many details to cover so the following will only list the simplest examples and provide references below for further details in case someone is interested.
Motivation
The amount of data available over SDMX is growing, the standard is a good fit for the type provider approach.
The goal
Implement the
SdmxProvider
which will support the simplest cases at the first step.Background
SDMX - Statistical Data and Metadata eXchange gives a standardized way of exposing statistical databases as a web service, which provides all necessary
metadata
and extensive ways of querying thedata
. Currently, there are multiple implementations of SDMX standard which can be accessed publiclySpecification and WorldBank example
For simplicity, let's remember already familiar WorldBank TypeProvider from
FSharpData
and replicate the same scenario using SDMX, let's say we want to query annual agricultural land data in Germany.WorldBank Provider
SDMX Specification
Following steps describe how the same data can be queried using SDMX rest API.
Everything starts from
wsEntryPoint
which in case of WorldBank isThere are two major parts to this process,
metadata
anddata
retrieval.Metadata
dataflows
- https://api.worldbank.org/v2/sdmx/rest/dataflow/all/all/latest/WDI
relatedmetadata
anddatastructure
information - https://api.worldbank.org/v2/sdmx/rest/datastructure/WB/WDI/1.0/?references=childrenData
Dimension information is used to create a query(
key
), we are looking for annual agricultural land data in Germany. To create such a key we build a sequence of dimension identifiers separated by a dots. (ordering matters).A
-Annual
AG_LND_AGRI_K2
-Agricultural land (sq. km)
DEU
-Germany
Data query(key):
A.AG_LND_AGRI_K2.DEU
Finally, data is retrieved using the URL: https://api.worldbank.org/v2/sdmx/rest/data/WDI/A.AG_LND_AGRI_K2.DEU/SDMX Provider
To query the same data from Wordlbank using SdmxProvider would look like following
Navigation using. (dots) should allow interaction on multiple levels. The initialization of TypeProvider will need initial configuration or static parameters which are
Foreseen issues
?queryparams
that is used for additional filtering in the type provider?Additional features to be included:
References
2.1
Comments, ideas, suggestions are welcome. thanks