ecmwf-projects / ai-vegetation-fuel

Predicting Fuel Load from earth observation data using Machine Learning
https://ml-fuel.readthedocs.io/en/latest/
Other
16 stars 5 forks source link
catboost climate-change lightgbm machine-learning satellite-imagery wildfire

ml-fuel: Predicting Fuel Load for Wildfire Modelling

Code style: black Documentation Status

Getting Started

The python environment for the repository can be created using either conda or virtualenv, by running from the root of the repo:

Using conda

conda create --name=ml-fuel python=3.8
conda activate ml-fuel

Using virtualenv

python3 -m venv env
source env/bin/activate

Install dependencies

pip install -U pip
pip install -r requirements.txt

This includes all the packages required for running the code in the repository, with the exclusion of the notebooks in the folder notebooks/ecmwf (see notebooks/ecmwf/README.md for the additional dependencies to install).

The content of this repository is split into 2 types of experiments:

  1. target is the fuel load = burned areas * above ground biomass
  2. target is dry matter = burned areas above ground biomass combustion coefficients / grid cell areas

Experiment 1

Data Description

7 years of global historical data, from 2010 - 2016 will be used for developing the machine learning models. All data used in this project is propietary and NOT meant for public release. Xarray, NumPy and netCDF libraries are used for working with the multi-dimensional geospatial data.

The data split into training, testing and validation is currently:

To change the split, modify data_split() in src/utils/generate_io_arrays.py, and the month list in src/test.py used during inference.

Pre-processing

Raw data should first be processed using notebooks in notebooks/preprocess/*. Entry point for the pre-processing script for the ML pipeline is src/pre-processing.py.

Args description:
      * `--data_path`:  Path to the data files.

Training

Entry-point for training is src/train.py

Args description:
      * `--model_name`:  Name of the model to be trained ("CatBoost" or "LightGBM").
      * `--data_path`:  Data directory where all the input (train, val, test) .csv files are stored.
      * `--exp_name`:  Name of the  training experiment used for logging.

Inference

Entry-point for inference is src/test.py

Args description:
      * `--model_name`:  Name of the model to be trained ("CatBoost" or "LightGBM").
      * `--model_path`:  Path to the pre-trained model.
      * `--data_path`:  Valid data directory where all the test .csv files are stored.
      * `--results_path`:  Directory where the result inference .csv files and .html visualizations are going to be stored.

Pre-trained models

Pre-trained models are available at:

Demo Notebooks

Notebooks for training and inference:

Fuel Load Prediction Visualizations:

Adding New Features:

Documentation

Documentation is available at: https://ml-fuel.readthedocs.io/en/latest/index.html.

Experiment 2

We employ an AutoML approach to predict dry matter using the H2O.ai AutoML framework. Please refer to notebooks/ecmwf/README.md for a description of this experiment, instructions to install additional dependencies and the notebooks with the steps to perform the experiment.

Info

This repository was developed by Anurag Saha Roy (@lazyoracle) and Roshni Biswas (@roshni-b) for the ESA-SMOS-2020 project. Contact email: info@wikilimo.co. The repository is now maintained by the Wildfire Danger Forecasting team at the European Centre for Medium-range Weather Forecast.