Allow for reading "static" fields from a separate file

Meteodan commented 5 years ago

I'm trying to use wrf-python with output from WRF-DART, and have been running into a few problems. One was that the WRF output files from various stages of DART (using the manhattan release) don't have many of the static fields that are needed for several diagnostic calculations in wrf-python.

I worked around this problem in my workflow by running a separate ncks script to add the static fields from the original wrfinput file to each of the WRF-DART forecast and analysis output files. However, this wastes disk space because these fields are the same for all files, and so arguably only need to be written in one master file (which is more-or-less what DART does). I thought a good way to allow for both scenarios (i.e. whether the static fields were present in the "main" file) would be to provide a keyword argument to getvar that would allow for a separate file containing the static fields to be read in. If this keyword argument were not provided, the original behavior of attempting to read them from the main file would be recovered.

bladwig1 commented 5 years ago

This is something that will be revisited once the front end of wrf-python is rewritten to better support xarray and datasets instead of files. Right now, it would be a bit of a re-engineer to pull that stuff from a different file, since getvar just dives down in to some basic netcdf reader stuff that's assuming it's all in one file (i.e. time is better spent with the xarray front end stuff, where I think xarray can aggregate this together already)

However, you can sort of hack this behavior in yourself right now, since wrf-python supports iterable classes as an input to getvar. The nature of netcdf4-python would still require an aggregated file be written to disk for each of your WRF files, but you can set it up to delete the files when the input object gets destroyed. Obviously, there's a performance hit that takes place.

Below is an example of one that cuts a domain subset out in order to reduce files sizes. You would want to modify the behavior for your application, but hopefully this illustrates the concept.

import tempfile
import glob
import shutil
import os
import numpy as np
from netCDF4 import Dataset
from wrf import getvar, ll_to_xy, CoordPair, GeoBounds, to_np

_VARS_TO_KEEP = ("Times", "XLAT", "XLONG", "XLAT_U", "XLAT_V", "XLONG_U", 
                "XLONG_V", "U", "V", "W", "PH", "PHB", "T", "P", "PB", "Q2", 
                "T2", "PSFC", "U10", "V10", "XTIME", "QVAPOR", "QCLOUD", 
                "QGRAUP", "QRAIN", "QSNOW", "QICE", "MAPFAC_M", "MAPFAC_U",
                "MAPFAC_V", "F", "HGT", "RAINC", "RAINSH", "RAINNC", "I_RAINC", "I_RAINNC",
                "PBLH")

class FileReduce(object):
    def __init__(self, filenames, geobounds, tempdir=None, vars_to_keep=None, 
                 max_pres=None, compress=False, delete=True, reuse=False):
        """An iterable object for cutting out geographic domains.

        Args:

            filenames (sequence): A sequence of file paths to the WRF files

            geobounds (GeoBounds): A GeoBounds object defining the region of interest

            tempdir (str): The location to store the temporary cropped data files. If None, tempfile.mkdtemp is used.

            vars_to_keep (sequence): A sequence of variables names to keep from the original file. None for all vars.

            max_press (float): The maximum pressure height level to keep. None for all levels.

            compress(bool): Set to True to enable zlib compression of variables in the output.

            delete (bool): Set to True to delete the temporary directory when FileReduce is garbage collected.

            reuse (bool): Set to True when you want to resuse the files that were previously converted. *tempdir* 
                must be set to a specific directory that contains the converted files and *delete* must be False.

        """
        self._filenames = filenames
        self._i = 0
        self._geobounds = geobounds
        self._delete = delete
        self._vars_to_keep = vars_to_keep
        self._max_pres = max_pres
        self._compress = compress
        self._cache = set()
        self._own_data = True
        self._reuse = reuse

        if tempdir is not None:
            if not os.path.exists(tempdir):
                os.makedirs(tempdir)
            self._tempdir = tempdir
            if self._reuse:
                self._cache = set((os.path.join(self._tempdir, name) 
                                   for name in os.listdir(self._tempdir)))
        else:
            self._tempdir = tempfile.mkdtemp()

        print ("temporary directory is: {}".format(self._tempdir))
        self._prev = None
        self._set_extents()

    def _set_extents(self):
        fname = list(self._filenames)[0]
        with Dataset(fname) as ncfile:
            lons = [self._geobounds.bottom_left.lon, self._geobounds.top_right.lon]
            lats = [self._geobounds.bottom_left.lat, self._geobounds.top_right.lat]
            orig_west_east = len(ncfile.dimensions["west_east"])
            orig_south_north = len(ncfile.dimensions["south_north"])
            orig_bottom_top = len(ncfile.dimensions["bottom_top"])

            # Note: Not handling the moving nest here
            # Extra points included around the boundaries to ensure domain is fully included
            x_y = ll_to_xy(ncfile, lats, lons, meta=False)
            self._start_x = 0 if x_y[0,0] == 0 else x_y[0,0] - 1
            self._end_x = orig_west_east - 1 if x_y[0,1] >= orig_west_east - 1 else x_y[0,1] + 1
            self._start_y = 0 if x_y[1,0] == 0 else x_y[1,0] - 1
            self._end_y = orig_south_north - 1 if x_y[1,1] >= orig_south_north - 1 else x_y[1,1] + 1

            self._west_east = self._end_x - self._start_x + 1
            self._west_east_stag = self._west_east + 1
            self._south_north = self._end_y - self._start_y + 1
            self._south_north_stag = self._south_north + 1

            # Crop the vertical to the specified pressure
            if self._max_pres is not None:
                pres = getvar(ncfile, "pressure")
                # Find the lowest terrain height
                ter = to_np(getvar(ncfile, "ter"))
                min_ter = float(np.amin(ter)) + 1
                ter_less = ter <= min_ter
                ter_less = np.broadcast_to(ter_less, pres.shape)
                # For the lowest terrain height, find the lowest vertical index to meet 
                # the desired pressure level. The lowest terrain height will result in the 
                # largest vertical spread to find the pressure level.
                x = np.transpose(((pres.values <= self._max_pres) & ter_less).nonzero())
                self._end_bot_top = np.amin(x, axis=0)[0] 
                if (self._end_bot_top >= orig_bottom_top):
                    self._end_bot_top = orig_bottom_top - 1
            else:
                self._end_bot_top = orig_bottom_top - 1

            self._bottom_top = self._end_bot_top + 1
            self._bottom_top_stag = self._bottom_top + 1

            print("bottom_top", self._bottom_top)

    def __iter__(self):
        return self

    def __copy__(self):
        cp = type(self).__new__(self.__class__)
        cp.__dict__.update(self.__dict__)
        cp._own_data = False
        cp._delete = False

        return cp

    def __del__(self):
        if self._delete:
            shutil.rmtree(self._tempdir)

    def reduce(self, fname):
        outfilename = os.path.join(self._tempdir, os.path.basename(fname))

        # WRF-Python can iterate over sequences several times during a 'getvar', so a cache is used to 
        if outfilename in self._cache:
            return Dataset(outfilename)

        # New dimension sizes
        dim_d = {"west_east" : self._west_east,
                 "west_east_stag" : self._west_east_stag,
                 "south_north" : self._south_north,
                 "south_north_stag" : self._south_north_stag,
                 "bottom_top" : self._bottom_top,
                 "bottom_top_stag" : self._bottom_top_stag
                }

        # Data slice sizes for the 2D dimensions
        slice_d = {"west_east" : slice(self._start_x, self._end_x + 1),
                   "west_east_stag" : slice(self._start_x, self._end_x + 2),
                   "south_north" : slice(self._start_y, self._end_y + 1),
                   "south_north_stag" : slice(self._start_y, self._end_y + 2),
                   "bottom_top" : slice(None, self._end_bot_top + 1),
                   "bottom_top_stag" : slice(None, self._end_bot_top + 2)
                  }

        with Dataset(fname) as infile, Dataset(outfilename, mode="w") as outfile:

            # Copy the global attributes
            outfile.setncatts(infile.__dict__)

            # Copy Dimensions, limiting south_north and west_east to desired domain
            for name, dimension in infile.dimensions.items():
                dimsize = dim_d.get(name, len(dimension))
                outfile.createDimension(name, dimsize)

            # Copy Variables  
            for name, variable in infile.variables.items():
                if self._vars_to_keep is not None:
                    if name not in self._vars_to_keep:
                        continue

                print (name)
                new_slices = tuple((slice_d.get(dimname, slice(None)) for dimname in variable.dimensions))

                outvar = outfile.createVariable(name, variable.datatype, variable.dimensions, zlib=self._compress)

                outvar[:] = variable[new_slices]

                outvar.setncatts(variable.__dict__)

        result = Dataset(outfilename)

        self._cache.add(outfilename)

        return result

    def next(self):
        if self._i >= len(self._filenames):
            if self._prev is not None:
                self._prev.close()
            raise StopIteration
        else:
            fname = self._filenames[self._i]
            reduced_file = self.reduce(fname)
            if self._prev is not None:
                self._prev.close()
            self._prev = reduced_file

            self._i += 1

            return reduced_file

    # Python 3
    def __next__(self):
        return self.next()

# How to use with getvar
# Set lower left and upper right to your desired domain
# Idaho bounding box: ["41.9880561828613","49.000846862793","-117.243034362793","-111.043563842773"]
ll = CoordPair(lat=41.8, lon=-117.26)
ur = CoordPair(lat=49.1, lon=-110.5)
bounds = GeoBounds(ll, ur)
reduced_files = FileReduce(glob.glob("/path/to/original/wrfout_*"),
                           bounds, vars_to_keep=_VARS_TO_KEEP, max_pres=400,
                           tempdir="/path/to/tempfile/reduced", 
                           delete=False, reuse=True)

pres = getvar(reduced_files, "pressure")

print(pres)

Meteodan commented 5 years ago

@bladwig1

Very sorry for not replying earlier (I have way too many balls in the air right now), but thank you for the quick reply all the same. I will look into this possibility in my workflow!

NCAR / wrf-python

Allow for reading "static" fields from a separate file #87