Welcome to the ogs-data-exploration repository π
This repository is dedicated to the cartography and exploration of greenhouse gas (GHG) emissions data at various geographical scales. Two main goals are defined for this repository:
The goal of documentation tasks consists of identifying relevant GHG data and documenting associated knowledge and concepts for understanding its content.
The goal of data exploration tasks consists of analyzing the different data sources identified in the documentation tasks in order to provide insights on GHG emissions data.
The repository structure is splitted into 3 main folders:
We proposed to have a first split of each folder by topic (e.g. ghg-emissions, socio-economic, geo-referential...) and then a second split by data source short name (e.g. ademe, wri (World Resources Institute), eea (European Environmental Agency)...).
Every data source folder should contains a README file for describing the data source and data provider and files containing datasets.
project_name/
βββ README.md # overview of the project
βββ data/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ README.md # Notebook for exploring data source 1
βββ dataset # Notebook for exploring data source
βββ source 2 # folder for data source 2
βββ README.md1 # Notebook for exploring data source 1
βββ dataset # Notebook for exploring data source 2
βββ socio-economic
βββ geo-ref
βββ notebooks/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ Notebook 1 # Notebook for exploring data source 1
βββ Notebook 2 # Notebook for exploring data source
βββ source 2 # folder for data source 2
βββ Notebook 1 # Notebook for exploring data source 1
βββ Notebook 2 # Notebook for exploring data source 2
βββ data-catalog/ # notebooks for exploring raw data
βββ ghg-emissions
βββ source 1 # folder for data source 1
βββ data-desc.json # json file describing data source 1
βββ source 1 # folder for data source 2
βββ data-desc.json # json file describing data source 2
Every contributor select one or two data sources to work on. The list of datasets are provided in this wiki page. NB: Some data sources provide several GHG emissions datasets such as the World Resources Institute.
Once you have selected your own datasource, you should clone the repository in your local machine (more informations about git commands are available at the end of this tutorial).
We decided, to make a branch by datasource in order to facilitate independant work at the beginning of this project. So you should have a specific branch for your data source.
In order to work on your branch, do not forget to switch to your branch directly before working on your data (git checkout NameOfYourBranch
)
Once you are in your branch, you can start by adding information in the readme file of your datasource.
Now we can start the exploratory analysis, go to notebook/ghg-emissions/name of your data source
and create your first notebook there.
After making some analysis you can commit the changes and push it to the remote repository. Do not forget to put the issue number correponding to the data exploratory task in your commit message (git commit -m "your message #x"
).
After pushing your changes, you shoudl go to Github, select your branch and make a pull request to merge your chnages made in your branch with the main. When you make pull request, you can assign the pull request to yourself, define a reviewer, attribute the pull request to a project and milestones. In order to follow better the backlog evolutio, do not forget to link the pull request to the issue you are working on.
You will receive a message of acceptance when your pull request is merged.
We want to build a data catalog making the compaison between different data sources easy. Hence, we need some information of data sources you habe just explored!
We ask you to fill a json file containing main dataset atttributes. Once filled, the json file should be stored in data-catalog/ghg-emissions/data source/data-desc.json
Here is an exemple of data desription:
{
"Version": "2021-02-04",
"DatasourceAttributes": [
{
"DataSourceStorageName": "ademe",
"Topics": ["GHG emissions"],
"DataProvider": {
"DataProviderName": "Lou Dupont",
"DataProviderLink": "https://www.data.gouv.fr/fr/datasets/bilans-demissions-de-ges-publies-sur-le-site-de-lademe-1/",
"DataProviderDesc": "NA"
},
"DataSource": {
"DataSourceName": "BEGES",
"DataSourceLink": "https://www.bilans-ges.ademe.fr/fr/bilanenligne/bilans/index/siGras/0",
"DataSourceOrganism": "ADEME",
"DataSourceDesc": "NA",
"DataFormat": "csv",
"DataAccess": "download"
},
"Coverage": {
"SpatialCoverage":{
"MainCoverage": "France",
"NumberOfSpatialEntities": 238,
"ListOfSpatialEntities": []
},
"TemporalCoverage": {
"MinDate": 2009,
"MaxDate": 2020,
}
},
"Resolution": {
"SpatialResolution": ["CommunautΓ© Urbaine", "CommunautΓ© d'agglomΓ©ration","CommunautΓ© de Commune",
"Communes", "DΓ©partements","MΓ©tropole","RΓ©gions"],
"TemporalResolution": "year",
},
"Gases": {
"Included": "All GHG gases",
"FilterByGase": "False"
},
"Scopes": {
"Included": ["Scope 1", "Scope 2", "Scope 3"],
"FilterByScope": "True"
},
"Sectors": {
"Included": "All sectors",
"FilterBySector": "False"
},
"Protocol": {
"ProtocolName": "Base Carbone",
"ProtocolLink": "https://www.bilans-ges.ademe.fr/fr/accueil/contenu/index/page/decouverte/siGras/1"
},
"EstimationMethod": {
"MethodType": "Bottom-up",
"MethodDescription": "https://www.bilans-ges.ademe.fr/fr/accueil/contenu/index/page/decouverte/siGras/1"
}
}
]
}
Notebooks (Python or R) providing exploratory analysis of raw data
We use Github Project to manage backlog. Every task is defined as an issue. Every issue is assigned to one contributor and belong to a milestone and a project.
We have 4 stages:
When we commit changes and make pull request, we should specify the issue number in order to automatize issue stage management.
git clone https://github.com/OpenGeoScales/ogs-data-exploration.git
cd ogs-data-exploration
git branch 'name of the branch' #If the branch does not already exist, create a new branch named "name of the branch"
git checkout 'name of the branch' #Switch to "new-branch"
For example to switch to the branch 'ademe'
git checkout ademe
git add .
git add data-catalog/ademe/ademe-data-desc.md notebooks/ademe/ademe-notebook.md
git commit -m "my message #1"
git push origin ademe