NASA-IMPACT / hls-project

Outline of the infrastructure and components used in the HLS project.
21 stars 0 forks source link

hls-project

This document provides an overview of code and artifacts used by the Harmonized Landsat Sentinel (HLS) project. For more detailed information about the HLS product specification and distribution consult the LPDAAC product landing page

The initial development goal of the HLS project was to expand existing, experimental HLS scientific algorithm code to a full scale global production pipeline running on scalable AWS infrastructure. Due to the nature of the project and the potential for a large number of components, an early decision was made to use individual repositories for code management rather than a monorepo. This provides the advantage of clear traceability and narrative of a component's development over time by reviewing the repository's commit history. The disadvantage of this approach is the large number of repositories with no clear map of how they are interelated. This document provides this map of how components are interelated and how they interoperate.

Containers

The core of the HLS processing pipeline is algorithmic C code packaged as Docker containers. Because different scientific libraries utilized in these containers share common dependencies, we use a simple hierarchical image dependency graph.

Alt text

Utilities

Generating HLS proudcts requires a suite of addtional metadata and secondary files for ingestion into external systems such as CMR, Cumulus and GIBs. These Python CLI utilities are installed and used from within the containers.

Static lookup files

The HLS pipeline relies on several static lookup files generated by the scientific team. To support full process reproducibility, the code used to generate these files is openly maintained.

Infrastructure and Orchestration

These repositories define the infrastructure as code and AWS components which manage the flow of data through the HLS procesing pipelines.