fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
807 stars 288 forks source link

New SDMX TypeProvider #1203

Open demonno opened 5 years ago

demonno commented 5 years ago

Since several SDMX standard-based data sources have emerged recently it would be useful to have a type provider supporting such data sources. The following exposes the current status of the effort of creating an SDMX TypeProvider. It is open to ideas and suggestions. I am very much looking forward to getting feedback from the FSharp.Data community to whether it would it be a good fit to have an SDMX type provider implementation in FSharp.Data.

There are many details to cover so the following will only list the simplest examples and provide references below for further details in case someone is interested.

Motivation

The amount of data available over SDMX is growing, the standard is a good fit for the type provider approach.

The goal

Implement the SdmxProvider which will support the simplest cases at the first step.

Background

SDMX - Statistical Data and Metadata eXchange gives a standardized way of exposing statistical databases as a web service, which provides all necessary metadata and extensive ways of querying the data. Currently, there are multiple implementations of SDMX standard which can be accessed publicly

Specification and WorldBank example

For simplicity, let's remember already familiar WorldBank TypeProvider from FSharpData and replicate the same scenario using SDMX, let's say we want to query annual agricultural land data in Germany.

WorldBank Provider

let wb = WorldBankData.GetDataContext()
let data = wb.Countries.Germany.Indicators.``Agricultural land (sq. km)``

SDMX Specification

Following steps describe how the same data can be queried using SDMX rest API.

Everything starts fromwsEntryPoint which in case of WorldBank is

There are two major parts to this process, metadata and data retrieval.

Metadata

Data

Dimension information is used to create a query(key), we are looking for annual agricultural land data in Germany. To create such a key we build a sequence of dimension identifiers separated by a dots. (ordering matters).

Data query(key): A.AG_LND_AGRI_K2.DEU Finally, data is retrieved using the URL: https://api.worldbank.org/v2/sdmx/rest/data/WDI/A.AG_LND_AGRI_K2.DEU/

SDMX Provider

To query the same data from Wordlbank using SdmxProvider would look like following

type wb = SdmxProvider<"https://api.worldbank.org/v2/sdmx/rest/">
let data = wb.``World Development Indicators``.Annual.``Agricultural land (sq. km)``.Germany

Navigation using. (dots) should allow interaction on multiple levels. The initialization of TypeProvider will need initial configuration or static parameters which are

Foreseen issues

Additional features to be included:

References


Comments, ideas, suggestions are welcome. thanks

ovatsus commented 5 years ago

Would be nice to be able to replace the WorldBank provider which is very specific with something like this that would generalize to other data sources, and I think a SDMX provider would fit nicely into FSharp.Data

ArmanAttaran commented 3 years ago

bumping this issue; this would make it much easier to create data science examples since the amount of data provided has grown significantly since this was created. any implementation tips would be appreciated it

juhan commented 3 years ago

A prototype working implementation is in https://github.com/demonno/FSharp.Data fork. We'll try to finally create a pull request based on that work. There is support for SDMX protocol version 2.1. Some SDMX sources offer only SDMX 2.0 protocol and that part is still not yet implemented. The description on how the proposed solution works is described here: https://digikogu.taltech.ee/en/Item/47d2c178-2681-4aa5-9e25-23868a21c29b

ArmanAttaran commented 3 years ago

@juhan no need to implement 2.0; sdmx 3.0 is being released this year as well. Most places will move to a more modern version shortly.