barronh / pseudonetcdf

PseudoNetCDF like NetCDF except for many scientific format backends
GNU Lesser General Public License v3.0
76 stars 35 forks source link

arlconcdump error #45

Closed bbakernoaa closed 6 years ago

bbakernoaa commented 6 years ago

@barronh

I found an error while trying to read the arlconcdump file I create. The concentration file has 5 aerosol bins and three levels. Here is the error

import PseudoNetCDF as pnc
pnc.pncopen('fengsha/fengsha_nams_20140510',format='arlconcdump')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-553-3757c761d647> in <module>()
----> 1 pnc.pncopen('fengsha/fengsha_nams_20140510',format='arlconcdump')

/data/aqf2/barryb/anaconda2/lib/python2.7/site-packages/PseudoNetCDF/_getreader.pyc in pncopen(*args, **kwds)
     75         reader = getreaderdict()[format]
     76 
---> 77     outfile = reader(*args, **kwds)
     78     if addcf:
     79         try:

/data/aqf2/barryb/anaconda2/lib/python2.7/site-packages/PseudoNetCDF/noaafiles/_cdump.pyc in __init__(self, path, metaonly)
     14         self._infile.seek(0,0)
     15         self._rec12345()
---> 16         if not metaonly: self._datarec()
     17 
     18     def _rec12345(self):

/data/aqf2/barryb/anaconda2/lib/python2.7/site-packages/PseudoNetCDF/noaafiles/_cdump.pyc in _datarec(self)
    177                     I = tmp[0]['data']['I']
    178                     C = tmp[0]['data']['C']
--> 179                     c[J, I] = C
    180                 else:
    181                     c = np.fromfile(infile, dtype = nopackfmt, count = 1)[0]

IndexError: index 401 is out of bounds for axis 1 with size 401

Here is the control file section5

BIN1
0.0
72.0
00 00 00 00 00
BIN2
0.0
72.0
00 00 00 00 00
BIN3
0.0
72.0
00 00 00 00 00
BIN4
0.0
72.0
00 00 00 00 00
BIN5
0.0
72.0
00 00 00 00 00
1
40.0 -115.0
0.1 0.1
30.0 40.0

I'm attaching the file (fengsha_nams_20140510) and a converted version to netcdf4 for comparison.

fengsha_nams_20140510.gz

fengsha_nams_20140510.nc.gz

barronh commented 6 years ago

In the third fortran record of that file, there are six values: NLATS, NLONS, DLAT, DLON, LLLAT, LLLON.

The NLONS value is 400. The reader grabs data in blocks of I,J,C where I and J are indexes and C is the concentration. In this case, one of the I values is 401.

This likely highlights a mistake in the reader that would have caused a 1-cell offset. The I and J values are 1-based, but being treated as 0-based in the reader.

If you add a - 1 to the Jidx and Iidx lines in noaafiles/_cdump.py, it should fix the first problem.

circa lines 192-193

                    Jidx = tmp[0]['data']['J'] - 1
                    Iidx = tmp[0]['data']['I'] - 1

It reveals another. The original test file had only one pollutant, but this one has 5. This reveals a need to change the npols and nlays order and then reshape the array.

circa lines 201-202

        pols = np.array(pols).reshape(ntimes, npols, nlays).swapaxes(1, 2)
        lays = np.array(lays).reshape(ntimes, npols, nlays).swapaxes(1, 2)

Can you test and confirm?

bbakernoaa commented 6 years ago

@barronh I cloned the flake8 branch and made the changes you suggested.

I get an AssertionError on line 207

In [1]: import PseudoNetCDF as pnc

In [2]: pnc.pncopen('/data/aqf/barryb/fengsha/fengsha_nams_20170330',format='arlconcdump')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-8c70f5157608> in <module>()
----> 1 pnc.pncopen('/data/aqf/barryb/fengsha/fengsha_nams_20170330',format='arlconcdump')

/data/aqf/barryb/pseudonetcdf/src/PseudoNetCDF/_getreader.pyc in pncopen(*args, **kwds)
     93         reader = getreaderdict()[format]
     94 
---> 95     outfile = reader(*args, **kwds)
     96     if addcf:
     97         try:

/data/aqf/barryb/pseudonetcdf/src/PseudoNetCDF/noaafiles/_cdump.py in __init__(self, path, metaonly)
     17         self._rec12345()
     18         if not metaonly:
---> 19             self._datarec()
     20 
     21     def _rec12345(self):

/data/aqf/barryb/pseudonetcdf/src/PseudoNetCDF/noaafiles/_cdump.py in _datarec(self)
    205         self.createDimension('latitude', nlats)
    206         self.createDimension('longitude', nlons)
--> 207         assert((lays[:, :, [0]] == lays[:, :, :]).all())
    208         assert((pols[:, [0], :] == pols[:, :, :]).all())
    209         for pi, pol in enumerate(pols[0, 0]):
barronh commented 6 years ago

It works fine on my machine. The error you're getting seems like the second set of edits weren't saved correctly (maybe put in addition to original lines?). I've sent you an updated copy via email where I also have now corrected the order for the particles in the datablock.

datablock = np.array(outs).reshape(-1, npols, nlays, nlats, nlons)
datablock = datablock.swapaxes(1,2)
bbakernoaa commented 6 years ago

@barronh that fixed the issue. I did notice that the delta lats and delta lons in the attributes look a little strange though...

pnc.pncopen('/data/aqf/barryb/fengsha/fengsha_nams_20170330',format='arlconcdump')
Out[2]: 
PseudoNetCDF.noaafiles._cdump.arlconcdump unknown {
dimensions:
        starts = 230 ;
        layer = 3 ;
        time = 48 ;
        latitude = 301 ;
        longitude = 401 ;

variables:
        integer START_YEAR(starts);
                START_YEAR:units = "year" ;
        integer START_MONTH(starts);
                START_MONTH:units = "month" ;
        integer START_DAY(starts);
                START_DAY:units = "day" ;
        integer START_HOUR(starts);
                START_HOUR:units = "hour" ;
        float START_LAT(starts);
                START_LAT:units = "degrees_north" ;
        float START_LON(starts);
                START_LON:units = "degrees_east" ;
        float START_ALT(starts);
                START_ALT:units = "meters" ;
        integer layer(layer);
                layer:units = "meters agl" ;
        float BIN1(time, layer, latitude, longitude);
                BIN1:units = "arbitrary" ;
                BIN1:description = u"BIN1" ;
        float BIN2(time, layer, latitude, longitude);
                BIN2:units = "arbitrary" ;
                BIN2:description = u"BIN2" ;
        float BIN3(time, layer, latitude, longitude);
                BIN3:units = "arbitrary" ;
                BIN3:description = u"BIN3" ;
        float BIN4(time, layer, latitude, longitude);
                BIN4:units = "arbitrary" ;
                BIN4:description = u"BIN4" ;
        float BIN5(time, layer, latitude, longitude);
                BIN5:units = "arbitrary" ;
                BIN5:description = u"BIN5" ;

// global properties:
                :NSTARTLOCS = 230 ;
                :METMODEL = u"NAMS" ;
                :MET_YEAR = 17 ;
                :MET_MONTH = 3 ;
                :MET_DAY = 30 ;
                :MET_HOUR = 0 ;
                :MET_FORECAST_HOUR = 0 ;
                :STARTING_LOCATIONS = 230 ;
                :PACKED = 1 ;
                :NLATS = 301 ;
                :NLONS = 401 ;
                :DELTA_LAT = 1036831949 ;
                :DELTA_LON = 1036831949 ;
                :LLCRNR_LAT = 25.0 ;
                :LLCRNR_LON = -135.0 ;
                :NLAYS = 3 ;
                :NPOLS = 5 ;
                :POLLUTANTS = u"BIN1,BIN2,BIN3,BIN4,BIN5" ;
barronh commented 6 years ago

Change the dtype in line 82 from '>i,>2i,>2i,>2f,>i' to '>i,>2i,>2f,>2f,>i'. I'll update in the develop version and then make a bug fix.

bbakernoaa commented 6 years ago

@barronh This fixed it all. Wonderful