Deltares / dfm_tools

A Python package for pre- and postprocessing D-FlowFM model input and output files
https://deltares.github.io/dfm_tools/
GNU General Public License v3.0
65 stars 11 forks source link

problem with slicing of time in open_dataset_extra #797

Open SCLaan opened 7 months ago

SCLaan commented 7 months ago

The time slicing in the function _open_datasetextra in _interpolategrid2bnd.py causes an error if the timestamp of the requested tstart is not at midnight (00:00).

Think of an xarray.DataArray, where:

In  : data_xr.time
Out : 
<xarray.DataArray 'time' (time: 4)> Size: 28B
array(['2021-12-31T00:00:00.000000000', '2022-01-01T00:00:00.000000000',
       '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000'],
       dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 28B 2021-12-31 2022-01-01 ... 2022-01-03
Attributes:
    valid_min:  2021-12-31T00:00:00.000000000
    valid_max:  2022-01-03T00:00:00.000000000

and: tstart, tstop = Timestamp('2021-12-31 12:00:00'), Timestamp('2022-01-02 12:00:00')

This line: https://github.com/Deltares/dfm_tools/blob/245909ce0dbf5ebcba14cba46e91a0b9036e8797/dfm_tools/interpolate_grid2bnd.py#L335 returns an xarray.DataArray without the first and last times:

In  : data_xr.time
Out : 
<xarray.DataArray 'time' (time: 2)> Size: 14B
array(['2022-01-01T00:00:00.000000000','2022-01-02T00:00:00.000000000'],
       dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 14B 2022-01-01 2022-01-02 ... 2022-01-02
Attributes:
    valid_min:  2021-12-31T00:00:00.000000000
    valid_max:  2022-01-03T00:00:00.000000000

This causes an error in the following line, where a check is done: https://github.com/Deltares/dfm_tools/blob/245909ce0dbf5ebcba14cba46e91a0b9036e8797/dfm_tools/interpolate_grid2bnd.py#L338

I suggest to change line 335 to: data_xr_vars = data_xr[quantity_list].sel(time=slice(tstart.floor('1d'),tstop.ceil('1d')))

SCLaan commented 7 months ago

MWE:

import os
from pathlib import Path

import dfm_tools as dfmt

# Settings
timevect    = {'t_start' : '2021-12-31 12:00:00', # include last time step in year before simulation starts
               't_end'   : '2022-01-02 12:00:00'} # include first time step in year after simulation finishes
coords      = [-76,-59,35,46]
refdate_str = 'MINUTES since 2015-01-01 00:00:00'
datadir     = r'.\data_test' # directory for downloaded files
overwrite   = True # overwrite existing files?

# Download data
os.makedirs(datadir, exist_ok=True)
dfmt.download_CMEMS('zos',
                    longitude_min=coords[0],longitude_max=coords[1],
                    latitude_min=coords[2], latitude_max=coords[3],
                    date_min=timevect['t_start'], date_max=timevect['t_end'],
                    dir_output=datadir, overwrite=overwrite)

# Convert to .bc
conversion_dict  = dfmt.get_conversion_dict()
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=Path(datadir,'{ncvarname}_*.nc'),
                                       quantity='waterlevelbnd', reverse_depth=True,
                                       tstart=timevect['t_start'], tstop=timevect['t_end'],
                                       conversion_dict=conversion_dict,
                                       refdate_str=refdate_str)

Another good example would be to download monthly output and convert this to a period that doesn't start or end on the first of the month. For example when downloading and converting monthly-mean NO3 (freq='M') from dataset_id='global-analysis-forecast-bio-001-028-monthly':

import os
from pathlib import Path

import dfm_tools as dfmt

# Settings
timevect    = {'t_start' : '2021-12-31 12:00:00', # include last time step in year before simulation starts
               't_end'   : '2022-01-02 12:00:00'} # include first time step in year after simulation finishes
coords      = [-76,-59,35,46]
refdate_str = 'MINUTES since 2015-01-01 00:00:00'
datadir     = r'.\data_test' # directory for downloaded files
overwrite   = True # overwrite existing files?

# Download data
os.makedirs(datadir, exist_ok=True)
dfmt.download_CMEMS('no3',
                    longitude_min=coords[0],longitude_max=coords[1],
                    latitude_min=coords[2], latitude_max=coords[3],
                    date_min=timevect['t_start'], date_max=timevect['t_end'],
                    freq='M', dataset_id='global-analysis-forecast-bio-001-028-monthly',
                    dir_output=datadir, overwrite=overwrite)

# Convert to .bc
conversion_dict  = dfmt.get_conversion_dict()
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=Path(datadir,'{ncvarname}_*.nc'),
                                       quantity='tracerbndNO3', reverse_depth=True,
                                       tstart=timevect['t_start'], tstop=timevect['t_end'],
                                       conversion_dict=conversion_dict,
                                       refdate_str=refdate_str)
veenstrajelmer commented 6 months ago

This also happens in tests/examples/preprocess_interpolate_nc_to_bc.py now (https://github.com/Deltares/dfm_tools/issues/802), which was not the case before: "OutOfRangeError: requested tstop 2012-04-01 12:00:00 outside of available range 2012-01-16 12:00:00 to 2012-03-16 12:00:00."

import os
import dfm_tools as dfmt
import xarray as xr
file_pli = r'p:\archivedprojects\11208054-004-dcsm-fm\models\model_input\bnd_cond\pli\DCSM-FM_OB_all_20181108.pli'
tstart = '2012-01-16 12:00'
tstop = '2012-04-16 12:00'
dir_sourcefiles_waq = r'p:\archivedprojects\11206304-futuremares-rawdata-preps\python_scripts\ocean_boundaryCMEMS\data_monthly'
dir_pattern = os.path.join(dir_sourcefiles_waq,'cmems_mod_glo_bgc_my_0.25_P1M-m_{ncvarname}_*.nc') 
ds = xr.open_mfdataset(dir_pattern.format(ncvarname="no3"))
print(ds.time)
data_xr_vars = dfmt.open_dataset_extra(dir_pattern=dir_pattern, 
                                       quantity='tracerbndNO3',
                                       tstart=tstart, tstop=tstop)`

But this is actually since the source data is monthly and mixed noon/midnight times.