Open Meteodan opened 5 years ago
This is something that will be revisited once the front end of wrf-python is rewritten to better support xarray and datasets instead of files. Right now, it would be a bit of a re-engineer to pull that stuff from a different file, since getvar just dives down in to some basic netcdf reader stuff that's assuming it's all in one file (i.e. time is better spent with the xarray front end stuff, where I think xarray can aggregate this together already)
However, you can sort of hack this behavior in yourself right now, since wrf-python supports iterable classes as an input to getvar. The nature of netcdf4-python would still require an aggregated file be written to disk for each of your WRF files, but you can set it up to delete the files when the input object gets destroyed. Obviously, there's a performance hit that takes place.
Below is an example of one that cuts a domain subset out in order to reduce files sizes. You would want to modify the behavior for your application, but hopefully this illustrates the concept.
import tempfile
import glob
import shutil
import os
import numpy as np
from netCDF4 import Dataset
from wrf import getvar, ll_to_xy, CoordPair, GeoBounds, to_np
_VARS_TO_KEEP = ("Times", "XLAT", "XLONG", "XLAT_U", "XLAT_V", "XLONG_U",
"XLONG_V", "U", "V", "W", "PH", "PHB", "T", "P", "PB", "Q2",
"T2", "PSFC", "U10", "V10", "XTIME", "QVAPOR", "QCLOUD",
"QGRAUP", "QRAIN", "QSNOW", "QICE", "MAPFAC_M", "MAPFAC_U",
"MAPFAC_V", "F", "HGT", "RAINC", "RAINSH", "RAINNC", "I_RAINC", "I_RAINNC",
"PBLH")
class FileReduce(object):
def __init__(self, filenames, geobounds, tempdir=None, vars_to_keep=None,
max_pres=None, compress=False, delete=True, reuse=False):
"""An iterable object for cutting out geographic domains.
Args:
filenames (sequence): A sequence of file paths to the WRF files
geobounds (GeoBounds): A GeoBounds object defining the region of interest
tempdir (str): The location to store the temporary cropped data files. If None, tempfile.mkdtemp is used.
vars_to_keep (sequence): A sequence of variables names to keep from the original file. None for all vars.
max_press (float): The maximum pressure height level to keep. None for all levels.
compress(bool): Set to True to enable zlib compression of variables in the output.
delete (bool): Set to True to delete the temporary directory when FileReduce is garbage collected.
reuse (bool): Set to True when you want to resuse the files that were previously converted. *tempdir*
must be set to a specific directory that contains the converted files and *delete* must be False.
"""
self._filenames = filenames
self._i = 0
self._geobounds = geobounds
self._delete = delete
self._vars_to_keep = vars_to_keep
self._max_pres = max_pres
self._compress = compress
self._cache = set()
self._own_data = True
self._reuse = reuse
if tempdir is not None:
if not os.path.exists(tempdir):
os.makedirs(tempdir)
self._tempdir = tempdir
if self._reuse:
self._cache = set((os.path.join(self._tempdir, name)
for name in os.listdir(self._tempdir)))
else:
self._tempdir = tempfile.mkdtemp()
print ("temporary directory is: {}".format(self._tempdir))
self._prev = None
self._set_extents()
def _set_extents(self):
fname = list(self._filenames)[0]
with Dataset(fname) as ncfile:
lons = [self._geobounds.bottom_left.lon, self._geobounds.top_right.lon]
lats = [self._geobounds.bottom_left.lat, self._geobounds.top_right.lat]
orig_west_east = len(ncfile.dimensions["west_east"])
orig_south_north = len(ncfile.dimensions["south_north"])
orig_bottom_top = len(ncfile.dimensions["bottom_top"])
# Note: Not handling the moving nest here
# Extra points included around the boundaries to ensure domain is fully included
x_y = ll_to_xy(ncfile, lats, lons, meta=False)
self._start_x = 0 if x_y[0,0] == 0 else x_y[0,0] - 1
self._end_x = orig_west_east - 1 if x_y[0,1] >= orig_west_east - 1 else x_y[0,1] + 1
self._start_y = 0 if x_y[1,0] == 0 else x_y[1,0] - 1
self._end_y = orig_south_north - 1 if x_y[1,1] >= orig_south_north - 1 else x_y[1,1] + 1
self._west_east = self._end_x - self._start_x + 1
self._west_east_stag = self._west_east + 1
self._south_north = self._end_y - self._start_y + 1
self._south_north_stag = self._south_north + 1
# Crop the vertical to the specified pressure
if self._max_pres is not None:
pres = getvar(ncfile, "pressure")
# Find the lowest terrain height
ter = to_np(getvar(ncfile, "ter"))
min_ter = float(np.amin(ter)) + 1
ter_less = ter <= min_ter
ter_less = np.broadcast_to(ter_less, pres.shape)
# For the lowest terrain height, find the lowest vertical index to meet
# the desired pressure level. The lowest terrain height will result in the
# largest vertical spread to find the pressure level.
x = np.transpose(((pres.values <= self._max_pres) & ter_less).nonzero())
self._end_bot_top = np.amin(x, axis=0)[0]
if (self._end_bot_top >= orig_bottom_top):
self._end_bot_top = orig_bottom_top - 1
else:
self._end_bot_top = orig_bottom_top - 1
self._bottom_top = self._end_bot_top + 1
self._bottom_top_stag = self._bottom_top + 1
print("bottom_top", self._bottom_top)
def __iter__(self):
return self
def __copy__(self):
cp = type(self).__new__(self.__class__)
cp.__dict__.update(self.__dict__)
cp._own_data = False
cp._delete = False
return cp
def __del__(self):
if self._delete:
shutil.rmtree(self._tempdir)
def reduce(self, fname):
outfilename = os.path.join(self._tempdir, os.path.basename(fname))
# WRF-Python can iterate over sequences several times during a 'getvar', so a cache is used to
if outfilename in self._cache:
return Dataset(outfilename)
# New dimension sizes
dim_d = {"west_east" : self._west_east,
"west_east_stag" : self._west_east_stag,
"south_north" : self._south_north,
"south_north_stag" : self._south_north_stag,
"bottom_top" : self._bottom_top,
"bottom_top_stag" : self._bottom_top_stag
}
# Data slice sizes for the 2D dimensions
slice_d = {"west_east" : slice(self._start_x, self._end_x + 1),
"west_east_stag" : slice(self._start_x, self._end_x + 2),
"south_north" : slice(self._start_y, self._end_y + 1),
"south_north_stag" : slice(self._start_y, self._end_y + 2),
"bottom_top" : slice(None, self._end_bot_top + 1),
"bottom_top_stag" : slice(None, self._end_bot_top + 2)
}
with Dataset(fname) as infile, Dataset(outfilename, mode="w") as outfile:
# Copy the global attributes
outfile.setncatts(infile.__dict__)
# Copy Dimensions, limiting south_north and west_east to desired domain
for name, dimension in infile.dimensions.items():
dimsize = dim_d.get(name, len(dimension))
outfile.createDimension(name, dimsize)
# Copy Variables
for name, variable in infile.variables.items():
if self._vars_to_keep is not None:
if name not in self._vars_to_keep:
continue
print (name)
new_slices = tuple((slice_d.get(dimname, slice(None)) for dimname in variable.dimensions))
outvar = outfile.createVariable(name, variable.datatype, variable.dimensions, zlib=self._compress)
outvar[:] = variable[new_slices]
outvar.setncatts(variable.__dict__)
result = Dataset(outfilename)
self._cache.add(outfilename)
return result
def next(self):
if self._i >= len(self._filenames):
if self._prev is not None:
self._prev.close()
raise StopIteration
else:
fname = self._filenames[self._i]
reduced_file = self.reduce(fname)
if self._prev is not None:
self._prev.close()
self._prev = reduced_file
self._i += 1
return reduced_file
# Python 3
def __next__(self):
return self.next()
# How to use with getvar
# Set lower left and upper right to your desired domain
# Idaho bounding box: ["41.9880561828613","49.000846862793","-117.243034362793","-111.043563842773"]
ll = CoordPair(lat=41.8, lon=-117.26)
ur = CoordPair(lat=49.1, lon=-110.5)
bounds = GeoBounds(ll, ur)
reduced_files = FileReduce(glob.glob("/path/to/original/wrfout_*"),
bounds, vars_to_keep=_VARS_TO_KEEP, max_pres=400,
tempdir="/path/to/tempfile/reduced",
delete=False, reuse=True)
pres = getvar(reduced_files, "pressure")
print(pres)
@bladwig1
Very sorry for not replying earlier (I have way too many balls in the air right now), but thank you for the quick reply all the same. I will look into this possibility in my workflow!
I'm trying to use wrf-python with output from WRF-DART, and have been running into a few problems. One was that the WRF output files from various stages of DART (using the manhattan release) don't have many of the static fields that are needed for several diagnostic calculations in wrf-python.
I worked around this problem in my workflow by running a separate ncks script to add the static fields from the original wrfinput file to each of the WRF-DART forecast and analysis output files. However, this wastes disk space because these fields are the same for all files, and so arguably only need to be written in one master file (which is more-or-less what DART does). I thought a good way to allow for both scenarios (i.e. whether the static fields were present in the "main" file) would be to provide a keyword argument to getvar that would allow for a separate file containing the static fields to be read in. If this keyword argument were not provided, the original behavior of attempting to read them from the main file would be recovered.