BONSAMURAIS / correspondence_tables

Work space for the correspondence tables working group for BONSAI
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link
correspondence-tables

Correspondence_tables

This is a work space for the correspondence tables working group for BONSAI

Background

BONSAI will draw data from multiple sources, e.g. national supply-use tables, statistical databases etc. These have their own native classification to define products, activities, elementary flows or, generally speaking, objects/activity flows.

The integration of these data requires correspondence tables. These establish a correspondence between the different classifications of flow-objects, activities and properties. These correspondance tables are sometimes available from data providers (e.g UN Stats). In other cases the correspondance tables are created by the group.

This repo contains the data and code to transform a series of correspondence table into a rdf files using ontologies compatible with bonsai. When possible, the code will generate the rdf files from the raw data as made available by the data provider.

Installation

with package managers

available via pip

pip install correspondence_tables

manual

Call python setup.py install inside the repository:

git clone git@github.com:BONSAMURAIS/correspondence_tables.git
cd correspondence_tables
python setup.py install

Usage

This functionality is not working yet, but eventually users can use the command line tool correspondence_tables-cli to regenerate the rdf files using something like:

correspondence_tables-cli regenerate output

Group members

Goals and objectives

The goal of this working group is to collect, validate and classify correspondence tables between existing classifications and to convert the correspondence tables into a RDF serialization format for entry into the BONSAI database.

Working procedure

The correspondence tables currently available are stored as received in the folder data\raw. The raw data has often to be reformated into a standadised format and stored in the folder data\intermediate with their metadata encoded as a descriptor following the frictionless data table schema. From the clean tables and their metadata the corresponding rdf file is created and stored in the folder data\final.

Overview of vocabulary used

In the RDF framework data is stored as statements of form subject-predicate-object. The existence of a predicate allows a more concise definition of the relation between the classifications. Here we provide an overview of the predicates used in correspondance tables.

note: in RDF subject object and predicate have unique identifiers (URIs), that makes the statements wordy for humans. The examples here are provided in Turtle serialization format. We use prefixes to make the sentences more readable.

prefixes:

rdfs:label it may be used to provide a human-readable version of the resource name

e.g. brdffo:Chemical-Structure.11148 rdfs:label "HFC-41"

This means that what the chemical structure 11148 is labelled as HFC-41,

OWL.SameAs this predicate indicates that subject and object are the same thing

e.g. : brdffo:Chemical-Structure.11148 owl:sameAs http://www.chemspider.com/Chemical-Structure.11148 .

This links the taxonomy of US EPA elementary flows to substances in the chemspider taxonomy. This allows access to a wide wealth of info available in Chemspider for the given substance.

rdfs:subClassOf

This means instances of one class are instances of another, e.g. HFC-41 is a subclass of HFC

Also, this predicate can be used to indicate that a class belongs to a specific classifications, such as "ISIC 4".

bont:superClassOf

We need to declare this predicate for the BONSAI ontology:

bont:superClassOf owl:inverseOf refs:SuperClassOf

The inverse of rdfs:subClassOf, allowing to import/export a correspondance table between two classifications as a csv-file with 3 columns (classification 1, predicate, Classification 2)