ecmwf / earthkit-data

A format-agnostic Python interface for geospatial data
Apache License 2.0
56 stars 13 forks source link

Optimization of Earthkit Data Extraction for Specific Region #487

Open vinayk7 opened 2 days ago

vinayk7 commented 2 days ago

Problem Description:

Hello, I'm using Earthkit to extract weather data from a GRIB file provided by ECMWF. Specifically, I'm working with the u-vector wind data at a pressure level of 1000 hPa. Below is the code I've implemented to extract the data from a specific region (latitude and longitude bounds). As you can see I am simply extracting all the data and slicing it then searching on it, since I am new to weather analysis and python in general I want to know that is it the most optimum way or there exists some function in datakit that will extract it much faster ?


import earthkit.data
import numpy as np

data = earthkit.data.from_source("file", "20241006060000-0h-oper-fc.grib2")

u = data.sel(param="u", level=1000)[0]         # select wind data u-vector
u_data = u.data()

# The following info obtained about variable u_data while debugging
# u_data has 3 dimensions:
# at 0: ndarray with shape (721, 1440) | min value is -90.0 and max value is 90.0 | size is 1038240
# at 1: ndarray with shape (721, 1440) | min value is -180.0 and max value is 179.75 | size is 1038240
# at 2: ndarray with shape (721, 1440) | min value is -30.25118637084961 and max value is 26.03006362915039 | size is 1038240

shp = u_data.shape
print(shp)  # (3, 721, 1440)

lat_data = u_data[0, :, 0]   # Latitude values (take first column, all rows)
lon_data = u_data[1, 0, :]   # Longitude values (take first row, all columns)
temp_data = u_data[2]        # Wind (u-vector) data

lat_bounds = (30.0, 40.0)    # Example: from 30°N to 40°N
lon_bounds = (-100.0, -90.0) # Example: from 100°W to 90°W

# Data about the latitude flow in GRIB file is arranged from +90 (North pole) to -90 (South pole)
# 90, 80, .. 40, 30. So higher latitude degree occurs at lower index, we adjust for the fact below
lat_min_idx = np.abs(lat_data - lat_bounds[1]).argmin()
lat_max_idx = np.abs(lat_data - lat_bounds[0]).argmin()

# Ensure latitude min and max indices are within bounds
lat_min_idx = max(0, lat_min_idx)
size_lat_array = shp[1]
lat_max_idx = min(size_lat_array - 1, lat_max_idx)

# Data about the longitude flow in GRIB file is arranged from -180 (West) to +180 (East)
# no adjustment is needed
lon_min_idx = np.abs(lon_data - lon_bounds[0]).argmin()
lon_max_idx = np.abs(lon_data - lon_bounds[1]).argmin()

# Ensure longitude indices are within bounds
lon_min_idx = max(0, lon_min_idx)
size_lon_array = shp[2]
lon_max_idx = min(size_lon_array - 1, lon_max_idx)

# Extract the wind data (u-vector) for the specified region
region_temp_data = temp_data[lat_min_idx:lat_max_idx + 1, lon_min_idx:lon_max_idx + 1]
sandorkertesz commented 2 hours ago

@vinayk7, unfortunately, earthkit does not offer a solution for subarea selection/cropping right now. However, all these features + grid handling will be available in the near future. Limited interpolation is already available in earthkit-regrid.