A Python package and open-source project for modelling environmental data with neural processes
----------- [![release](https://img.shields.io/badge/release-v0.4.2-green?logo=github)](https://github.com/alan-turing-institute/deepsensor/releases) [![Latest Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://alan-turing-institute.github.io/deepsensor/) ![Tests](https://github.com/alan-turing-institute/deepsensor/actions/workflows/tests.yml/badge.svg) [![Coverage Status](https://coveralls.io/repos/github/alan-turing-institute/deepsensor/badge.svg?branch=main)](https://coveralls.io/github/alan-turing-institute/deepsensor?branch=main) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![slack](https://img.shields.io/badge/slack-deepsensor-purple.svg?logo=slack)](https://ai4environment.slack.com/archives/C05NQ76L87R) [![All Contributors](https://img.shields.io/github/all-contributors/alan-turing-institute/deepsensor?color=ee8449&style=flat-square)](#contributors) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/alan-turing-institute/deepsensor/blob/main/LICENSE) DeepSensor streamlines the application of neural processes (NPs) to environmental sciences by providing a simple interface for building, training, and evaluating NPs using `xarray` and `pandas` data. Our developers and users form an open-source community whose vision is to accelerate the next generation of environmental ML research. The DeepSensor Python package facilitates this by drastically reducing the time and effort required to apply NPs to environmental prediction tasks. This allows DeepSensor users to focus on the science and rapidly iterate on ideas. DeepSensor is an experimental package, and we welcome [contributions from the community](https://github.com/alan-turing-institute/deepsensor/blob/main/CONTRIBUTING.md). We have an active Slack channel for code and research discussions; you can join by [signing up for the Turing Environment & Sustainability stakeholder community](https://forms.office.com/pages/responsepage.aspx?id=p_SVQ1XklU-Knx-672OE-ZmEJNLHTHVFkqQ97AaCfn9UMTZKT1IwTVhJRE82UjUzMVE2MThSOU5RMC4u). The form includes a question on signing up for the Slack team, where you can find DeepSensor's channel. ![DeepSensor example application figures](https://raw.githubusercontent.com/alan-turing-institute/deepsensor/main/figs/deepsensor_application_examples.png) Why neural processes? ----------- NPs are a highly flexible class of probabilistic models that offer unique opportunities to model satellite observations, climate model output, and in-situ measurements. Their key features are the ability to: - ingest multiple data streams of pointwise or gridded modalities - handle missing data and varying resolutions - predict at arbitrary target locations - quantify prediction uncertainty These capabilities make NPs well suited to a range of spatio-temporal data fusion tasks such as downscaling, sensor placement, gap-filling, and forecasting. Why DeepSensor? ----------- This package aims to faithfully match the flexibility of NPs with a simple and intuitive interface. Under the hood, DeepSensor wraps around the powerful [neuralprocessess](https://github.com/wesselb/neuralprocesses) package for core modelling functionality, while allowing users to stay in the familiar [xarray](https://xarray.pydata.org) and [pandas](https://pandas.pydata.org) world from end-to-end. DeepSensor also provides convenient plotting tools and active learning functionality for finding optimal [sensor placements](https://doi.org/10.1017/eds.2023.22). Documentation ----------- We have an extensive documentation page [here](https://alan-turing-institute.github.io/deepsensor/), containing steps for getting started, a user guide built from reproducible Jupyter notebooks, learning resources, research ideas, community information, an API reference, and more! DeepSensor Gallery ----------- For real-world DeepSensor research demonstrators, check out the [DeepSensor Gallery](https://github.com/tom-andersson/deepsensor_gallery). Consider submitting a notebook showcasing your research! Deep learning library agnosticism ----------- DeepSensor leverages the [backends](https://github.com/wesselb/lab) package to be compatible with either [PyTorch](https://pytorch.org/) or [TensorFlow](https://www.tensorflow.org/). Simply `import deepsensor.torch` or `import deepsensor.tensorflow` to choose between them! Quick start ---------- Here we will demonstrate a simple example of training a convolutional conditional neural process (ConvCNP) to spatially interpolate random grid cells of NCEP reanalysis air temperature data over the US. First, pip install the package. In this case we will use the PyTorch backend (note: follow the [PyTorch installation instructions](https://pytorch.org/) if you want GPU support). ```bash pip install deepsensor pip install torch ``` We can go from imports to predictions with a trained model in less than 30 lines of code! ```python import deepsensor.torch from deepsensor.data import DataProcessor, TaskLoader from deepsensor.model import ConvNP from deepsensor.train import Trainer import xarray as xr import pandas as pd import numpy as np from tqdm import tqdm # Load raw data ds_raw = xr.tutorial.open_dataset("air_temperature") # Normalise data data_processor = DataProcessor(x1_name="lat", x2_name="lon") ds = data_processor(ds_raw) # Set up task loader task_loader = TaskLoader(context=ds, target=ds) # Set up ConvNP, which by default instantiates a ConvCNP with Gaussian marginals model = ConvNP(data_processor, task_loader) # Generate training tasks with up 100 grid cells as context and all grid cells # as targets train_tasks = [] for date in pd.date_range("2013-01-01", "2014-11-30")[::7]: N_context = np.random.randint(0, 100) task = task_loader(date, context_sampling=N_context, target_sampling="all") train_tasks.append(task) # Train model trainer = Trainer(model, lr=5e-5) for epoch in tqdm(range(10)): batch_losses = trainer(train_tasks) # Predict on new task with 50 context points and a dense grid of target points test_task = task_loader("2014-12-31", context_sampling=50) pred = model.predict(test_task, X_t=ds_raw) ``` After training, the model can predict directly to `xarray` in your data's original units and coordinate system: ```python >>> pred["air"]