barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
76 stars 35 forks source link

arlpackedbit isMine xarray backend #31

Closed bbakernoaa closed 6 years ago

bbakernoaa commented 6 years ago

@barronh Would it be possible to include the arlpackedbit into the xarray interface? I think this would be a great feature.

barronh commented 6 years ago

@bbakernoaa I've added the capability and tested it with the xarray interface in py3 and py27. Works for me. Please confirm and I'll close.

bbakernoaa commented 6 years ago

@barronh Thank you so much for adding this. Sorry, it took so long for me to test this and get back to you.

Would you also be able to add the cdump and gemconc files to _arl for the xarray-backend?

barronh commented 6 years ago

Can you point me to good documentation of the formats?

bbakernoaa commented 6 years ago

@barronh I'm attaching the discription for each file type; concentration, particle dump file, and trajectory. These are taken from the files

Concentration: /hyspli4/html/S363.htm PARDUMP: /hysplit4/html/S442.htm Trajectory: /hysplit4/html/S263.htm

in the hysplit code.

_**Concentration / Display / File Format

Concentration packing has been implemented with HYSPLIT version 4.5. The updated format is downward compatible in that all display programs can read files produced from versions prior to 4.5, but older versions of the display programs cannot read the new packed output format. Note that HYSPLIT V4.5 can be configured to produced the older style unpacked concentration files. Concentration file packing does not write the same information in fewer bytes, but rather writes the same information using twice as many bytes. The packed files are generally smaller because only concentration values at the non-zero grid points are written to the output file by the model. However this requires the grid point location to be written with the concentration, hence the additional bytes. If most of the grid is expected to have non-zero concentrations, then the old style format will save space. The output format of the unformatted binary (big-endian) concentration file written by dispersion model (hycs_std) and read by all concentration display programs is as follows:

Record #1

CHAR*4 Meteorological MODEL Identification
INT*4 Meteorological file starting time (YEAR, MONTH, DAY, HOUR, FORECAST-HOUR)
INT*4 NUMBER of starting locations
INT*4 Concentration packing flag (0=no 1=yes) 

Record #2 Loop to record: Number of starting locations

INT*4 Release starting time (YEAR, MONTH, DAY, HOUR)
REAL*4 Starting location and height (LATITUDE, LONGITUDE, METERS)
INT*4 Release starting time (MINUTES) 

Record #3

INT*4 Number of (LATITUDE-POINTS, LONGITUDE-POINTS)
REAL*4 Grid spacing (DELTA-LATITUDE,DELTA-LONGITUDE)
REAL*4 Grid lower left corner (LATITUDE, LONGITUDE) 

Record #4

INT*4 NUMBER of vertical levels in concentration grid
INT*4 HEIGHT of each level (meters above ground) 

Record #5

INT*4 NUMBER of different pollutants in grid
CHAR*4 Identification STRING for each pollutant 

Record #6 Loop to record: Number of output times

INT*4 Sample start (YEAR MONTH DAY HOUR MINUTE FORECAST) 

Record #7 Loop to record: Number of output times

INT*4 Sample stop (YEAR MONTH DAY HOUR MINUTE FORECAST) 

Record #8 Loop to record: Number levels, Number of pollutant types

CHAR*4 Pollutant type identification STRING
INT*4 Output LEVEL (meters) of this record

No Packing (all elements)
REAL*4 Concentration output ARRAY

Packing (only non-zero elements)

INT*4 Loop non-zero elements
INT*2 First (I) index value
INT*2 - Second (J) index value
REAL*4 - Concentration at (I,J)**_ 

================================================================================

Advanced / Special Topics / Particle Dump File Format

The concentration configuration menu provides an option to write a model initialization file, which by default is always named "PARDUMP" (for particle dump). This file can be written at regular intervals during the simulation, a convenient way to restart a simulation in case of unexpected failure. To restart the model using the PARDUMP file it is only necessary for the file to be present in the root working directory. If the internal time stamp of the file matches the start time of the simulation, the model will initialize the particle count from the file before emitting new particles according to the emission scenario defined in the control file. The format of the PARDUMP file is given below:

Record #1

INT*4 Number of particles
INT*4 Number of pollutants
INT*4 Time of particle dump (YEAR, MONTH, DAY, HOUR, MINUTES)

Record #2 - Loop to record: Number of particles

REAL*4 Particle pollutant mass (times the number of pollutants)
REAL*4 Particle LATITUDE, LONGITUDE, HEIGHT, SIGMA-U,SIGMA-V, SIGMA-X
INT*4 Particle AGE, DISTRIBUTION, POLLUTANT, METEO-GRID,SORT-INDEX

The "Particle" tab of the "Special File Display" menu brings up a Windows based viewer that shows the particle positions over a map background. The display can be zoomed and otherwise adjusted using the left and right mouse buttons in conjunction with the shift and cntl keys. Help is provided on the screen with the left and right side comments corresponding to the respective mouse button. The particle viewer can also be used to overlay satellite images on the particle positions. More information on this is provided "FTP Satellite Data" help menu. The particle position file may be converted to a binary concentration file through the command line utility program par2conc.

=================================================================================

Trajectory / Display / Endpoint File Format

The trajectory model generates its own text output file of ASCII end-point positions. The trajectory display program processes the end-point file. The format of the file is given below:

Record #1

I6 - Number of meteorological grids used in calculation

Records Loop #2 through the number of grids

A8 - Meteorological Model identification
5I6 - Data file starting Year, Month, Day, Hour, Forecast Hour

Record #3

I6 - number of different trajectories in file
1X,A8 - direction of trajectory calculation (FORWARD, BACKWARD)
1X,A8 - vertical motion calculation method (OMEGA, THETA, ...)

Record Loop #4 through the number of different trajectories in file

4I6 - starting year, month, day, hour
2F9.3 - starting latitude, longitude
F8.1 - starting level above ground (meters)

Record #5

I6 - number (n) of diagnostic output variables
n(1X,A8) - label identification of each variable (PRESSURE, THETA, ...)

Record Loop #6 through the number of hours in the simulation

I6 - trajectory number
I6 - meteorological grid number or antecedent trajectory number
5I6 - year month day hour minute of the point
I6 - forecast hour at point
F8.1 - age of the trajectory in hours
2F9.3 - position latitude and longitude
1X,F8.1 - position height in meters above ground
n(1X,F8.1) - n diagnostic output variables (1st to be output is always pressure)
barronh commented 6 years ago

@bbakernoaa - Can you post example files for me to use as testcases? and open a new issue?

thanks,

bbakernoaa commented 6 years ago

Yes, will do.

On Mon, Jan 29, 2018 at 10:21 PM barronh notifications@github.com wrote:

@bbakernoaa https://github.com/bbakernoaa - Can you post example files for me to use as testcases? and open a new issue?

thanks,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/barronh/pseudonetcdf/issues/31#issuecomment-361464638, or mute the thread https://github.com/notifications/unsubscribe-auth/AVFKt5kNqsD9GP-zK7vZVnMy12qjbegmks5tPoqhgaJpZM4Qb48w .

bbakernoaa commented 6 years ago

@barronh here are a few sample files. These are taken from the HYSPLIT test case

hysplit_files.tar.gz

barronh commented 6 years ago

@bbakernoaa - arlconcdump and arlpardump work for me. Please test on your end and confirm.

bbakernoaa commented 6 years ago

@barronh I was able to verify that the arlconcdump worked. The arlpardump seems to be having a problem with decoding the time here is the error

xr.open_mfdataset('/home/bbaker/Downloads/PARDUMP_043',engine='pnc')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-44a87404e16f> in <module>()
----> 1 xr.open_mfdataset('/home/bbaker/Downloads/PARDUMP_043',engine='pnc')

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/backends/api.pyc in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, **kwargs)
    540         lock = _default_lock(paths[0], engine)
    541     datasets = [open_dataset(p, engine=engine, chunks=chunks or {}, lock=lock,
--> 542                              **kwargs) for p in paths]
    543     file_objs = [ds._file_obj for ds in datasets]
    544 

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/backends/api.pyc in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    306             lock = _default_lock(filename_or_obj, engine)
    307         with close_on_error(store):
--> 308             return maybe_decode_store(store, lock)
    309     else:
    310         if engine is not None and engine != 'scipy':

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/backends/api.pyc in maybe_decode_store(store, lock)
    223             store, mask_and_scale=mask_and_scale, decode_times=decode_times,
    224             concat_characters=concat_characters, decode_coords=decode_coords,
--> 225             drop_variables=drop_variables)
    226 
    227         _protect_dataset_variables_inplace(ds, cache)

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.pyc in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
   1027     vars, attrs, coord_names = decode_cf_variables(
   1028         vars, attrs, concat_characters, mask_and_scale, decode_times,
-> 1029         decode_coords, drop_variables=drop_variables)
   1030     ds = Dataset(vars, attrs=attrs)
   1031     ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.pyc in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
    960         new_vars[k] = decode_cf_variable(
    961             v, concat_characters=concat, mask_and_scale=mask_and_scale,
--> 962             decode_times=decode_times)
    963         if decode_coords:
    964             var_attrs = new_vars[k].attrs

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.pyc in decode_cf_variable(var, concat_characters, mask_and_scale, decode_times, decode_endianness)
    897             units = pop_to(attributes, encoding, 'units')
    898             calendar = pop_to(attributes, encoding, 'calendar')
--> 899             data = DecodedCFDatetimeArray(data, units, calendar)
    900         elif attributes['units'] in TIME_UNITS:
    901             # timedelta

/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.pyc in __init__(self, array, units, calendar)
    418             if not PY3:
    419                 msg += ' Full traceback:\n' + traceback.format_exc()
--> 420             raise ValueError(msg)
    421         else:
    422             self._dtype = getattr(result, 'dtype', np.dtype('object'))

ValueError: unable to decode time units 'minutes since release' with the default calendar. Try opening your dataset with decode_times=False. Full traceback:
Traceback (most recent call last):
  File "/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.py", line 411, in __init__
    result = decode_cf_datetime(example_value, units, calendar)
  File "/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.py", line 179, in decode_cf_datetime
    calendar)
  File "/home/bbaker/anaconda2/lib/python2.7/site-packages/xarray/conventions.py", line 110, in _decode_datetime_with_netcdf4
    dates = np.asarray(nc4.num2date(num_dates, units, calendar))
  File "netCDF4/_netCDF4.pyx", line 5680, in netCDF4._netCDF4.num2date
  File "netCDF4/_netCDF4.pyx", line 5473, in netCDF4._netCDF4._dateparse
  File "netcdftime/_netcdftime.pyx", line 939, in netcdftime._netcdftime._parse_date
  File "netcdftime/_netcdftime.pyx", line 951, in netcdftime._netcdftime._parse_date
ValueError: Unable to parse date string 'release'
barronh commented 6 years ago

Let's close this issue and only update #38

Though related, this issue #31 (arlpackedbit isMine) is done.