Closed hannahker closed 2 months ago
@t-downing @zackarno thanks both for the reviews! I've addressed most comments so should be ready for another review.
@t-downing also totally agree that having an organized structure across pipelines will be important. However, this isn't something that I want to spend too much time designing while we're still quite early in the process of setting up these pipelines -- there's still a lot we don't know! My thinking in this PR was to keep the set up and folder structure quite straightforward and lightweight so that we can iterate as we get a better sense of requirements over time. Once this PR and @isatotun's work on IMERG are complete, I think we'll be in a much better place to plan the best way to keep things organized.
This PR adds code to download and process archival SEAS5 data from ECMWF's MARS service. The proposed directory structure and method of calling the pipeline is by no means set in stone -- I'm assuming this is something we'll iterate on further as the code base grows.
Usage:
The pipeline can be run locally from the command line by calling:
<scope>
: Eitherglobal
ortest
.global
will download data for the full planet andtest
will use a bounding box around Afghanistan.test
should be used during development to download smaller subsets of data from MARS.<start_year>
: The year to begin downloading annual data for<end_year>
: The year to download annual data until (not inclusive)This code is also configured as a Job on Databricks, called "Update SEAS5 Archive". This can be triggered manually and has been used for bulk tasks (ie. more than a couple years) due to significantly improved performance.
Processing details
Raw files:
Global, monthly precipitation forecasts are downloaded in yearly
.grib
files. Each raw.grib
contains all ensemble members (26 or 51, depending on the year) and lead times (0-6 months ahead). See this JIRA ticket for more detailed docs on how the MARS API call is parameterized. All raw.grib
data is stored in thedev
Azure storage container underglobal/mars/raw/
. Files are namedseas5_mars_tprate_{year}.grib
.Processed files:
The
.grib
file from each year is processed to output 84 cloud-optimized-geotiffs (.tif
):EPSG:4326
)All processed files are saved to the
prod
Azure storage container underraster/seas5/
(TODO!). Files are namedseas5_mars_tprate_em_i{pub_date}_lt{leadtime}.tif
.NOTE: Outputs will be saved to
prod
following this PR review!