NCAR / wrf-python

A collection of diagnostic and interpolation routines for use with output from the Weather Research and Forecasting (WRF-ARW) Model.
https://wrf-python.readthedocs.io
Apache License 2.0
408 stars 154 forks source link

getvar() from many wrf output files? #94

Open Timothy-W-Hilton opened 5 years ago

Timothy-W-Hilton commented 5 years ago

First and foremost, thanks for providing this fantastic tool.

I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).

For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.

Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).

Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.

My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().

Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.

cross85 commented 4 years ago

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files
rajkumar8581 commented 2 years ago

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?