Canopy Height Estimation at very High Resolution
Apache License 2.0
Open-Canopy: a Country-Scale Dataset for Canopy Height Estimation at Very High Resolution

This is the official repository associated with the pre-print: "Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution".

This repository includes the code needed to reproduce all experiments in the paper.

Context & Data

Estimating canopy height and canopy height change at meter resolution from satellite imagery has numerous applications, such as monitoring forest health, logging activities, wood resources, and carbon stocks. However, many existing forestry datasets rely on commercial or closed data sources, restricting the reproducibility and evaluation of new approaches. To address this gap, we introduce Open-Canopy, an open-access and country-scale benchmark for very high resolution (1.5 m) canopy height estimation. Covering more than 87,000 km2 across France, Open-Canopy combines SPOT 6-7 satellite imagery with high resolution aerial LiDAR data. Additionally, we propose a benchmark for canopy height change detection between two images taken at different years, a particularly challenging task even for recent models. To establish a robust foundation for these benchmarks, we evaluate a comprehensive list of state-of-the-art computer vision models for canopy height estimation.

Examples of canopy height estimation

Height Estimation

Example of canopy height change estimation

Height Change Estimation

Dataset Structure

A full description of the dataset can be found in the supplementary material of the Open-Canopy article.

Our training, validation, and test sets cover most of the French territory. Test tiles are separated from train and validation tiles by a 1km buffer (a).

For each tile, we provide VHR images at a 1.5 m resolution (b) and associated LiDAR-derived canopy height maps (c).

Dataset overview


System requirements

Python environment

We provide here instructions for installation with miniconda/mamba. Installation was tested on Mac and Linux.

If you are installing on a computer without GPU, update the environment.yamlfile (uncomment cpu_onlyand comment pytorch-cuda=11.8).

# Clone the project
git clone https://github.com/Open-Canopy
cd Open-Canopy

# create a conda environment for Open-Canopy and install dependencies
conda env create -f environment.yaml -n canopy
# If it doesn't work setting channel-priority to flexible can help
# conda config set channel_priority flexible

# activate conda environment
conda activate canopy

Note: Open-Canopy makes use of rootutils so you do not have to install Open-Canopy with pip. Once you have the environment ready, see the Usage section to run scripts.

Download models pretrained on ImageNet

A script is provided to download all the models finetuned in the benchmark.

# Supposing you are at the root or Open-Canopy
# make script executable
chmod +x scripts/download_pretrained.sh

# run script

Please refer to the official github repository to download Tolan et al.'s pretrained model, and copy it at the following location: Open-Canopy/datasets/Models/tolan_SSLlarge.pth.

Note: Alternative size of models not used in the benchmark are commented out in the script. An associated config file is also given for all of them.

Download Open-Canopy dataset

We recommend using Hugging Face python API to download the Open-Canopy dataset.

The dataset must be located at the following location: Open-Canopy/datasets (unless you change paths in configs). A script is provided to do it seamlessly:

# Supposing you are at the root or Open-Canopy
python scripts/download_dataset.py


This repository is built upon PyTorch. Its structure was bootstrapped from this code template, which heavily relies on Hydra and Pytorch-Lightning. Parameters for training can be accessed and modified through the config files in the configs folder or overridden in the command line. Models and dataloaders models were implemented as in https://github.com/archaeoscape-ai/archaeoscape, thanks to Yohann Perron and Vladyslav Sydorov.

Data preprocessing

See the preprocessing README for instructions on processing data from scratch, e.g., if you want to extend Open-Canopy to new areas.

Retrieve data

As described in the supplementary material of the paper, SPOT 6-7 imagery, LiDAR height maps and classification rasters can be accessed through virtual files (one per year).

A grid of 1km x 1km tiles is provided in the file "geometries.geojson", with a column "split" indicating to which split each tile belongs ("train"/"val"/"test"/"buffer").

See examples/visualize_data.py for an example of how to plot data for a given geometry.

Train a model

Supposing you are at the root of Open-Canopy.

Train a default model with the default configuration (ViT small):

python src/train.py

Train a Unet with the default configuration:

python src/train.py model=smp_unet

After training, the model automatically proceeds to prediction and evaluation on the test tiles. The resulting metrics, along with other logs in the logs directory, are saved in an Excel file for easy reference and analysis.

The list of all commands used for the experiments in the paper can be found in scripts/canopy.sh. Some of them make use of the hydra multirun functionality. Configs for training are located in the configsfolder at the root of the repository.

Compute metrics

If you already have height maps, e.g., those provided in the folder canopy_height/predictionsand canopy_height_change of the dataset:

To evaluate height estimation, complete the src/metrics/configs/compute_metrics.yamlconfig and run:

python src/metrics/compute_metrics.py

To evaluate height change estimation, complete the src/metrics/configs/compute_change_detection_metrics.yamlconfig and run:

python src/metrics/compute_change_detection_metrics.py

These two scripts output excel files with computed metrics.

Pretrained models

Unet and PVTv2 models trained on Open-Canopy are available in the pretrained_models folder of the dataset. Corresponding configs are located at configs/model/PVTv2_B.yamland configs/model/smp_unet.yaml. Additional documentation coming soon.


Please include a citation to the following article if you use the Open-Canopy dataset:

      title={Open-Canopy: A Country-Scale Benchmark for Canopy Height Estimation at Very High Resolution},
      author={Fajwel Fogel and Yohann Perron and Nikola Besic and Laurent Saint-André and Agnès Pellissier-Tanon and Martin Schwartz and Thomas Boudras and Ibrahim Fayad and Alexandre d'Aspremont and Loic Landrieu and Philippe Ciais},
      publisher = {arXiv},


This paper is part of the project AI4Forest, which is funded by the French National Research Agency (ANR), the German Aerospace Center (DLR) and the German federal ministry for education and research (BMBF).

The experiments conducted in this study were performed using HPC/AI resources provided by GENCI-IDRIS (Grant 2023-AD010114718 and 2023-AD011014781) and Inria.

Dataset license

The "OPEN LICENCE 2.0/LICENCE OUVERTE" is a license created by the French government specifically for the purpose of facilitating the dissemination of open data by public administration. If you are looking for an English version of this license, you can find it at the official github page.

Fajwel Fogel (ENS), Yohann Perron (LIGM, ENPC, CNRS, UGE, EFEO), Nikola Besic (LIF, IGN, ENSG), Laurent Saint-André (INRAE, BEF), Agnès Pellissier-Tanon (LSCE/IPSL, CEA-CNRS-UVSQ), Martin Schwartz (LSCE/IPSL, CEA-CNRS-UVSQ), Thomas Boudras (LSCE/IPSL, CEA-CNRS-UVSQ), Ibrahim Fayad (LSCE/IPSL, CEA-CNRS-UVSQ, Kayrros), Alexandre d'Aspremont (CNRS, ENS, Kayrros), Loic Landrieu (LIGM, ENPC, CNRS, UGE), Philippe Ciais (LSCE/IPSL, CEA-CNRS-UVSQ).