ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Crop region before regridding when using extract_shape #301

Open Peter9192 opened 4 years ago

Peter9192 commented 4 years ago

Is your feature request related to a problem? Please describe. I want to extract data for a specific region (a hydrological catchment) using extract_shape. Additionally, I need to regrid (interpolate) the data. In this case, I encounter a problem, because if I extract the shape first, I'm missing information for regridding near the boundaries. However, if I regrid first, I'm doing unnecessary work (because 90% of the interpolated data will be discarded later), which often leads to memory problems.

I think the easiest way to solve this is to extend the extract_shape preprocessor with a margin argument that allows the user to specify that some additional space needs to be added to the lat/lon coordinates before the cube is cropped. This implies that the order in which extract_region and regrid are called must be enforced by the user.

Would you be able to help out? I might have some time to implement the proposed solution.

Peter9192 commented 4 years ago

As pointed out by @JaroCamphuijsen, the proposed solution is not sufficient as the data outside the shape will be masked. Alternative solution would be to make it possible to pass a shapefile to extract_region

JaroCamphuijsen commented 4 years ago

To keep the use of extract_region as intuitive as it is now, I actually propose to make a new function extract_bbox that accepts a shapefile around which to extract the bounding box. It will calculate the bounding box similar to what is done in _crop_cube, optionally add the padding to that and use extract_region to return the new cube.

Note that implementing this functionality in the extract_shape function would only take one if-block with a return statement right after cropping. However we have to consider not overloading the extract_shape function, since it will do something else than its name suggests.

Another option I see is making the _crop_cube function (which is now called from within extract_shape when selecting the crop option) a public function and either:

Padding can easily be added to any of these options