ShervanGharari / EASYMORE

EASYMORE; EArth SYstem MOdeling REmapper
GNU General Public License v3.0
23 stars 21 forks source link

Check if input netcdf files actually have dimensions that would warrant remapping #67

Closed wknoben closed 8 months ago

wknoben commented 1 year ago

Use case:

User (for whatever reason) tries to remap a netcdf file that only has spatial dimensions of size 1 (i.e., a single latitude and longitude value; 1 grid cell).

Current behaviour:

EASYMORE crashes on calling nc_remapper():

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

File C:\Globus endpoint\CAMELS_spat\camels-spat-env\lib\site-packages\easymore\easymore.py:152, in easymore.nc_remapper(self)
    150     print(self.temp_dir+self.case_name+'_target_shapefile.shp')
    151 # create source shapefile
--> 152 source_shp_gpd = self.create_source_shp()
    153 if self.save_temp_shp:
    154     source_shp_gpd.to_file(self.temp_dir+self.case_name+'_source_shapefile.shp')

File C:\Globus endpoint\CAMELS_spat\camels-spat-env\lib\site-packages\easymore\easymore.py:286, in easymore.create_source_shp(self)
    284 import geopandas as gpd
    285 # find the case, 1: regular, 2: rotated, 3: irregular Voronoi diagram creation if not provided
--> 286 self.NetCDF_SHP_lat_lon()
    287 # create the source shapefile for case 1 and 2 if shapefile is not provided
    288 if (self.case == 1 or self.case == 2):

File C:\Globus endpoint\CAMELS_spat\camels-spat-env\lib\site-packages\easymore\easymore.py:716, in easymore.NetCDF_SHP_lat_lon(self)
    714 lat_temp_diff = np.diff(lat_temp)
    715 lat_temp_diff_2 = np.diff(lat_temp_diff)
--> 716 max_lat_temp_diff_2 = max(abs(lat_temp_diff_2))
    717 print('max difference of lat values in source nc files are : ', max_lat_temp_diff_2)
    718 lon_temp = np.array(ncid.variables[self.var_lon][:])

ValueError: max() arg is an empty sequence

Desired behavior:

Options:

  1. EASYMORE performs a quick check on self.source_nc to see if at least one of self.var_lat and self.var_lon is larger than 1. If not, gracefully exit and inform user that for this file remapping is not needed.
  2. Perform remapping anyway (essentially just skip any remapping steps and simply create the target netCDF file with identical data values as source netCDF)
ShervanGharari commented 1 year ago

Tnx for pointing this out. Yes, I am aware of this, we need at least a 3by3bytime variable to be able to move forward with the current code. Bart faced the same issue for very small cases and the solution was to subset the forcing with margins. I understand that might not be desirable to inflate the source nc file 9 times larger. I can easily include option 1 for a graceful exit. option 2 can be accommodated by taking in a native resolution of the gridded data that will help create the single grid without surrounding grids. The code can give a graceful exit while asking for esmr.source_nc_resoluton. In this case, if provided a single grid can be created. I think the creation of a single grid and following the current workflow in the code is a better approach than directly manipulating the remapping file. There might be cases where the target shapefile and the single grid may have no or partial overlap which need to be checked first by intersection. I should give this more thought on how to handle 1 grid case but I am confident both 1 and 2 should be addressed in version 2.0.0 by end of this month! Meanwhile, the option to move forward is: 1- buffer for subsetting of gridded data 2- manually creating the one grid shapefile and feed to easymore by source_shp similar to an irregular case (may face some issues as the dimensions of lat and lon may be different from case 3 in which the lat and lon with have the same dimension name such as n).

ShervanGharari commented 1 year ago

@wknoben if you pass me the target shapefile and the snippet to netcdf file with one grid, I can set up an example and test the changes to the source code. tnx in advance.

wknoben commented 1 year ago

Here's the beginning of a function that pads an existing netcdf file with empty values. This means a 1x1xTime domain can be expended to a 3x3xTime domain, after which easymore should work.

def add_empty_grid_cells_around_single_cell_netcdf(file, 
                                                   grid_spacing=0.25, grid_cells=3, 
                                                   lat_dim='latitude', lon_dim='longitude', tim_dim='time'):

    '''Takes an existing netCDF4 file with latitude and longitude dimensions of size 1, and adds np.nan cells around this'''

    ds = xr.open_dataset(small_src_file)

    # Get existing latitude and longitude values, and define the extended coordinates
    old_lat = ds[lat_dim].values[0]
    old_lon = ds[lon_dim].values[0]
    new_lat = np.linspace(old_lat - grid_spacing, old_lat + grid_spacing, 3)
    new_lon = np.linspace(old_lon - grid_spacing, old_lon + grid_spacing, 3)

    # Create a grid_cells-by-grid_cells-by-time set of NaNs
    new_data = np.empty( (len(ds[tim_dim]), grid_cells, grid_cells) ) # Note that dim order must match what's used below
    new_data[:] = np.nan

    # Create a new xarray dataset with the expanded grid and data
    new_ds = xr.Dataset( coords={lon_dim: new_lon, lat_dim: new_lat, tim_dim: ds[tim_dim]})
    for variable in ds.variables:
        if variable not in [lat_dim, lon_dim, tim_dim]:
            new_var = xr.DataArray(new_data, dims=(tim_dim, lat_dim, lon_dim), name=variable) # Note that order of dimensions here must match NaNs above
            new_ds[variable] = new_var.copy() # This copy() is ESSENTIAL - without it, whenever you update the variable values below, you update these values WHEREVER this new_var is inserted in the dataset, i.e., everywhere
            new_ds[variable].attrs = ds[variable].attrs
            new_ds[variable].loc[ {tim_dim: ds[tim_dim], lon_dim: ds[lon_dim], lat_dim: ds[lat_dim]} ] = ds[variable][:] # Copy existing values
            #assert all(new_ds[variable].isel(latitude=1,longitude=1) == ds[variable]), f'{variable} values not correctly copied' # slow
            assert (new_ds[variable].isel(latitude=1,longitude=1) - ds[variable]).sum() == 0, f'{variable} values not correctly copied' # faster
    return new_ds

Test case:

ERA5_1966-11_pressure_variables.zip

import numpy as np
import xarray as xr

small_src_file ='ERA5_1966-11_pressure_variables.nc'
ds = xr.open_dataset(small_src_file)
ds_ext = add_empty_grid_cells_around_single_cell_netcdf(small_src_file, grid_spacing=0.25)
(ds_ext.isel(latitude=1,longitude=1) == ds).all() # selects the center lat & lon cells of new dataset, then compares against old
ShervanGharari commented 8 months ago

@wknoben Version 2.0.0 currently on the main branch should have this issue fixed. It should work with netcdf files with one grid, with 1 row or 1 column grids. The version is not yet on PyPI and needs local installation. Virtual env example is here. Once you confirm, I will close this issue. Note that loading version 2 is a bit different. from easymore import Easymore # for version 2 and above The rest of the code remains the same.