NASA-IMPACT/hls-project

hls-project

This document provides an overview of code and artifacts used by the Harmonized Landsat Sentinel (HLS) project. For more detailed information about the HLS product specification and distribution consult the LPDAAC product landing page

The initial development goal of the HLS project was to expand existing, experimental HLS scientific algorithm code to a full scale global production pipeline running on scalable AWS infrastructure. Due to the nature of the project and the potential for a large number of components, an early decision was made to use individual repositories for code management rather than a monorepo. This provides the advantage of clear traceability and narrative of a component's development over time by reviewing the repository's commit history. The disadvantage of this approach is the large number of repositories with no clear map of how they are interelated. This document provides this map of how components are interelated and how they interoperate.

Containers

The core of the HLS processing pipeline is algorithmic C code packaged as Docker containers. Because different scientific libraries utilized in these containers share common dependencies, we use a simple hierarchical image dependency graph.

Alt text

espa-dockerfiles Provides the core library dependencies used by all of our project images. Specifically, we utilize centos.external as our primary base image.
hls-base - Provides the externally developed C/Python/Matlab libraries used for atmospheric correction and cloud masking in both our HLS S30 and L30 pipelines. These include
- espa-product-formatter - An ESPA metadata and format conversion utility.
- espa-surface-reflectance - The C implementation of the LaSRC surface reflectance algorithm.
- espa-python-library - A Python library for manipulating and validating ESPA metadata.
- Fmask - The Matlab implementation of the Fmask cloud masking algorithm.
hls-sentinel - Uses internally developed C libraries and utilities for generating HLS S30 products from Sentinel 2 inputs.
hls-landsat - Uses internally developed C libraries and utilities for generating intermediate surface reflectance proudcts from Landsat inputs.
hls-landat-tile - Provides the internally devloped C libraries and utilities for generating tiled HLS L30 proudcts from intermediate surface reflectance proudcts.
hls-laads - Uses C utilities from espa-surface-reflectance to download and synchronize required auxilary data from the LAADS DAAC.

Utilities

Generating HLS proudcts requires a suite of addtional metadata and secondary files for ingestion into external systems such as CMR, Cumulus and GIBs. These Python CLI utilities are installed and used from within the containers.

hls-metadata - Generate CMR metadata for HLS products.
hls-cmr_stac - Convert CMR metadata to STAC metadata for HLS products.
hls-utilities - A suite of utilities used by HLS processing to read and manipulate Sentinel and Landsat product specific file formats.
hls-browse_imagery - Create GIBs browse imagery for HLS products.
hls-hdf_to_cog - Convert internal HLS hdf product formats to COGs for distribution.
hls-manifest - Generate Cumulus CNM messages to be used for LPDAAC ingestion.
hls-thumbnails - Generate reduced resolution, true color thumbnails for HLS products.
hls-testing_data - Not an actual utility but a suite of sample HLS products used for integration testing utilities which must read directly from the file format.

Static lookup files

The HLS pipeline relies on several static lookup files generated by the scientific team. To support full process reproducibility, the code used to generate these files is openly maintained.

hls-L8S2overlap - Generates a lookup file of MGRS Landsat Path Row intersections clipped to the HLS data processing boundaries.
hls-land_tiles - Generates a lookup file of valid MGRS land tiles used to trim the MGRS Landsat Path Row overlap file.

Infrastructure and Orchestration

These repositories define the infrastructure as code and AWS components which manage the flow of data through the HLS procesing pipelines.

hls-orchestration - The core HLS processing infrastructure which receives notifcations for new Sentinel 2 and Landsat data to process and generates HLS S30 and L30 products.
hls-sentinel2-downloader-serverless - Monitors new publications and continually downloads Sentinel 2 data from the ESA International Access Hub in near real time.
hls-landsat-historic - Sends date range listings of historical Landsat data from the USGS S3 archive to hls-orchestration for incremental archival processing.
hls-lpdaac - Sends HLS product CNM messages to LPDAAC's Cumulus queue to trigger ingest.