Deltares / dfm_tools

A Python package for pre- and postprocessing D-Flow FM model input and output files
https://deltares.github.io/dfm_tools/
GNU General Public License v3.0
70 stars 13 forks source link

Improve CMEMS download performance #1033

Open veenstrajelmer opened 1 month ago

veenstrajelmer commented 1 month ago

Downloading long timeseries for CMEMS is slow with dfm_tools, even though the actual download happens with a daily frequency. This is probably since per default the entire requested dataset is opened, from which then daily subsets are retrieved: https://github.com/Deltares/dfm_tools/blob/f7e5234271789c754f7ba4bb3eaf3d7ab995a5bd/dfm_tools/download.py#L216-L249

This example shows that when cutting it up in monthly chunks, the download is way faster compared to retrieving at once:

import dfm_tools as dfmt
import pandas as pd

# spatial extents
lon_min, lon_max, lat_min, lat_max = 12.5, 16.5, 34.5, 37

# time extents
date_min = '2017-12-01'
date_max = '2022-07-31'

# make list of start/stop times (tuples) with monthly frequency
# TODO: this approach improves performance significantly
date_range_start = pd.date_range(start=date_min, end=date_max, freq='MS')
date_range_end = pd.date_range(start=date_min, end=date_max, freq='ME')
monthly_periods = [(start, end) for start, end in zip(date_range_start, date_range_end)]

# make list of start/stop times (tuples) to download all at once (but still per day)
# TODO: this is the default behaviour and is slow
monthly_periods = [(date_min, date_max)]

for period in monthly_periods: 
    dfmt.download_CMEMS(varkey='uo',
                        longitude_min=lon_min, longitude_max=lon_max, latitude_min=lat_min, latitude_max=lat_max,
                        date_min=period[0], date_max=period[1],
                        dir_output=".", overwrite=True, dataset_id='med-cmcc-cur-rean-d')

Todo: