Closed BSchilperoort closed 3 months ago
@SarahAlidoost you should be able to run the following recipe now, which will download all input data for STEMMUS_SCOPE for half a year. To make this recipe work for longer time periods we'll have to add a feature that adds NaNs if the requested start/end time is not appropriate for a dataset.
# config (folder, login info etc goes to a ~/.zampy/config file)
name: "STEMMUS_SCOPE_input"
download:
time: ["2020-01-01", "2020-06-30"]
bbox: [60, 10, 50, 0] # NESW
datasets:
era5_land:
variables:
- air_temperature
- dewpoint_temperature
- soil_temperature
- soil_moisture
era5:
variables:
- total_precipitation
- surface_thermal_radiation_downwards
- surface_solar_radiation_downwards
- surface_pressure
- eastward_component_of_wind
- northward_component_of_wind
eth_canopy_height:
variables:
- height_of_vegetation
fapar_lai:
variables:
- leaf_area_index
land_cover:
variables:
- land_cover
prism_dem_90:
variables:
- elevation
cams:
variables:
- co2_concentration
convert:
convention: ALMA
frequency: 1H # outputs at 1 hour frequency. Pandas-like freq-keyword.
resolution: 0.25 # output resolution in degrees.
Not sure why the Windows tests are failing. I am able to reproduce it on my machine, but it seems like the netCDF4 library or xarray is not releasing the file lock on the netCDF files. This makes the temp dir clean up fail because it cannot unlink the files.
I did not change that part of the code either, so it probably has something to do with a new version somewhere. Or with Dask because I did make the CI use Dask distributed (to avoid memory issues as the default scheduler is bad).
Not sure why the Windows tests are failing. I am able to reproduce it on my machine, but it seems like the netCDF4 library or xarray is not releasing the file lock on the netCDF files. This makes the temp dir clean up fail because it cannot unlink the files.
I did not change that part of the code either, so it probably has something to do with a new version somewhere. Or with Dask because I did make the CI use Dask distributed (to avoid memory issues as the default scheduler is bad).
Looking at the log of action, it seems that there are different errors on windows:
E PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\zampy\\zampy\\tests\\test_data\\fapar-lai\\tmp\\tmpvxggwz_9\\c3s_LAI_20190110000000_GLOBE_PROBAV_V3.0.1.nc'
E NotADirectoryError: [WinError 267] The directory name is invalid: 'D:\\a\\zampy\\zampy\\tests\\test_data\\fapar-lai\\tmp\\tmpvxggwz_9\\c3s_LAI_20190110000000_GLOBE_PROBAV_V3.0.1.nc'
FAILED tests/test_datasets/test_fapar_lai.py::TestFaparLAI::test_ingest - NotADirectoryError: [WinError 267] The directory name is invalid: 'D:\\a\\zampy\\zampy\\tests\\test_data\\fapar-lai\\tmp\\tmpvxggwz_9\\c3s_LAI_20190110000000_GLOBE_PROBAV_V3.0.1.nc'
I think Dask workers cause these errors. Can you please refactor dask.distributed.Client()
in test_fapar_lai.py::TestFaparLAI::test_ingest
with submit
and result
methods of client
and check if it fixes the errors.
FAPAR dataset test fixed, now a different test fails with a segfault due to rasterio...
Cause must be some dependency that has changed. My old environment passes all tests fine still.
Failed conditions
55.8% Coverage on New Code (required ≥ 80%)
To add these I had to merge them and add a "depth" dimension. The original files are split by layer...
Closes #47
Note that the CDS is slow, so running the recipe might take a while. Running it overnight is probably best.
Example recipe: