cmosig / sentle

Sentinel-1 & Sentinel-2 data cubes at large scale (bigger-than-memory) on any machine with integrated cloud detection, snow masking, harmonization, merging, and temporal composites.
MIT License
23 stars 2 forks source link

sentle

License Black DOI

Download Sentinel-1 & Sentinel-2 data cubes of huge-scale (larger-than-memory) on any machine with integrated cloud detection, snow masking, harmonization, merging, and temporal composites.


Important Notes

1) This package is in early alpha stage. There will be bugs! If you encounter any error, warning, memory issue, etc. please open a GitHub issue with the code to reproduce. 2) This package is meant for large-scale processing and any area that is smaller than 8km in width and height will not run faster because of the underlying processing scheme.

Installing

This package is tested with Python 3.12.*. It may or may not work with other versions.

pip install sentle

or

git clone git@github.com:cmosig/sentle.git
cd sentle
pip install -e .

Guide

Process

There is only one important function: process. Here, you specify all parameters necessary for download and processing. Once this function is called, it immediately starts downloading and processing the data you specified into a zarr file.

from sentle import sentle
from rasterio.crs import CRS

sentle.process(
    zarr_store="mycube.zarr",
    target_crs=CRS.from_string("EPSG:32633"),
    bound_left=176000,
    bound_bottom=5660000,
    bound_right=216000,
    bound_top=5700000,
    datetime="2022-06-17/2023-06-17",
    target_resolution=10,
    S2_mask_snow=True,
    S2_cloud_classification=True,
    S2_cloud_classification_device="cuda",
    S1_assets=["vv", "vh"],
    S2_apply_snow_mask=True,
    S2_apply_cloud_mask=True,
    time_composite_freq="7d",
    num_workers=10,
)

This code downloads data for a 40km by 40km area with one year of both Sentinel-1 and Sentinel-2. Clouds and snow are detected and replaced with NaNs. Data is also averaged every 7 days.

Everything is parallelized across 10 workers and each worker immediately saves its results to the specified path to a zarr_store. This ensures you can download larger-than-memory cubes.

Explanation:

Visualize

Load the data with xarray.

import xarray as xr
da = xr.open_zarr("mycube.zarr").sentle
da

And visualize using the awesome lexcube package. Here, band B02 is visualized from the above example. One is able to spot the cloud gaps and the spotty coverage during winter.

import lexcube
lexcube.Cube3DWidget(da.sel(band="B02"), vmin=0, vmax=4000)

image

Questions you may have

How do I scale this program?

Increase the number of workers using the num_workers parameter when setting up the Sentle class. With default spatial chunk size of 4000, specified by processing_spatial_chunk_size, you should plan with 2GiB per worker.

Contributing

Please submit issues or pull requests if you feel like something is missing or needs to be fixed.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments