Open Timothy-W-Hilton opened 5 years ago
This is the code I use, I haven't use it with >5000 files, but I think it should work.
import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES
list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1):
wrflist.append(Dataset(list_of_paths[i]))
HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files
This is the code I use, I haven't use it with >5000 files, but I think it should work.
import glob from netCDF4 import Dataset from wrf import getvar, ALL_TIMES list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order wrflist=[] for i in range(0, len(list_of_paths)-1): wrflist.append(Dataset(list_of_paths[i])) HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files
Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?
First and foremost, thanks for providing this fantastic tool.
I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).
For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.
Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).
Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.
My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().
Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.