ARM-DOE / pyart

The Python-ARM Radar Toolkit. A data model driven interactive toolkit for working with weather radar data.
https://arm-doe.github.io/pyart/
Other
513 stars 266 forks source link

Let NEXRADLevel2File's filename be a gzipped (.gz) file #341

Closed codypiersall closed 9 years ago

codypiersall commented 9 years ago

All of the level 2 archive files that I've seen from NOAA's data archive site have been .gz files. Would it be worth detecting the filename's extension, and if it is a .gz file opening it with gzip.open instead of just using open? As it is, you can work around this by using an open .gz file as the filename. I can submit a pull request if I need to; it will basically only add a couple lines in NEXRADLevel2File.__init__:

if filename.endswith('.gz'):  # maybe should use os.path.splitext()[-1] == '.gz' to be cross-platform?
    fh = gzip.open(filename, 'rb') # 'rb' is the default, don't really need it...

and it would need to add the relevant imports, too.

As an aside, everywhere I've seen on NOAA's website mentions that the radar data is stored as a bzipped file, but all of the ones I've seen have been gzipped. Granted, I've only worked with a single data set from KTLX. Are the others different?

scollis commented 9 years ago

The realtime data is actually a bunch of concatenated bzip2-ed messages.. So we disassemble, unbzip and reassemble

Data you get from the NOAA archive system is unbzipped concatenated and the gzipped.

On 7/30/15 3:13 PM, Cody Piersall wrote:

All of the level 2 archive files that I've seen from NOAA's data archive site http://www.ncdc.noaa.gov/nexradinv/ have been .gz files. Would it be worth detecting the filename's extension, and if it is a .gz file opening it with |gzip.open| instead of just using open? As it is, you can work around this by using an open .gz file as the filename. I can submit a pull request if I need to; it will basically only add a couple lines in |NEXRADLevel2File.init|:

|if filename.endswith('.gz'): # maybe should use os.path.splitext()[-1] == '.gz' to be cross-platform? fh = gzip.open(filename, 'rb') # 'rb' is the default, don't really need it... |

and it would need to add the relevant imports, too.

As an aside, everywhere I've seen on NOAA's website mentions that the radar data is stored as a bzipped file, but all of the ones I've seen have been gzipped. Granted, I've only worked with a single data set from KTLX. Are the others different?

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/341.

I ride for Parkinsons research http://www.events.org/sponsorship.aspx?id=51573

codypiersall commented 9 years ago

Ah, I finally get it! Thanks. It still seems like this could be worth adding, what do you think? Like I said, though, you can already do

import gzip
f = gzip.open('KTLX20110524_000032_V3.gz')
nexrad = NEXRADLevel2File(f)

so it's not like it's critical, just one of those "nice to have" things.

codypiersall commented 9 years ago

Oops, clicked the wrong button; didn't mean to close it.

scollis commented 9 years ago

Yep.. keep the issue open so others can respond..

On 7/30/15 3:23 PM, Cody Piersall wrote:

Ah, I finally get it! Thanks. It still seems like this could be worth adding, what do you think? Like I said, though, you can already do

|import gzip f = gzip.open('KTLX20110524_000032_V3.gz') nexrad = NEXRADLevel2File(f) |

so it's not like it's critical, just one of those "nice to have" things.

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/341#issuecomment-126471217.

I ride for Parkinsons research http://www.events.org/sponsorship.aspx?id=51573

jjhelmus commented 9 years ago

Transparent reading of gzipped (and bzip2) files including NEXRAD Level 2 files is supported by pyart.io.read but not the underlying read_* functions. So the following just works:

radar = pyart.io.read('KTLX20110524_000032_V3.gz')

Peeking at the beginning of the file is used to detected gzipped files in the auto_read module which is a much more robust and reliable method then checking if the filename ends with .gz which can be error prone and does not work with is a file like object is passed to NEXRADLevel2File. I think this method could be abstracted out and used in other functions and classes. Let me think about this a bit and see if there is a blocker, for some reason I though there was the last time I though about this.

jjhelmus commented 9 years ago

I seem to recall receiving bz2 files from NCDC but they may have changed the compression they are using.

In addition, Level 2 files can have bzip2 compressed records even if the file itself is not compressed. For example the files served by the UCAR THREDDS Data Server have compressed records.

jjhelmus commented 9 years ago

It is possible to create a general purpose function which will open gzip, bzip2, and non-compressed files but this will create a coupling between this function which should be placed in the pyart.io.common module and the modules making use of this function. Currently the nexrad_level2 and nexrad_level3 do not depend on anything in Py-ART and can be used independently, a situation that I think should be maintained. Importing this enhanced open function using a try/except block should meet both of these requirements. I'll implement this as time permits.

jjhelmus commented 9 years ago

Sorry for the delay in looking at this. I'll be working on a solution to this today, hopefully should have something merged by the end of the day which allows for transparent decompression of Gzip and BZip file in the read_filetype functions. The underlying NEXRADLevel2File and related classes will still need to use explicit decompression but Py-ART will provide a function which performs this in the public API under the pyart.io namespace. Details will be outlined in the upcomming PR.

jjhelmus commented 9 years ago

Py-ART will now decompress gzip and bzip2 files passed to the read_nexrad_archive function. Adding this functionality directly to the NEXRADLevel2File and related classes would introduced undesired coupling in this module or require extensive repetition of code, but of which are bad. Py-ART does expose the prepare_for_read function in the pyart.io namespace which can be used to add this functionality to the NEXRADLevel2File or other classes. For example:

import pyart
from pyart.io.nexrad_level2 import NEXRADLevel2File
nfile = NEXRADLevel2File(pyart.io.prepare_for_read('KATX20130717_195021_V06.gz'))

Additionally the prepare_for_read function can be easily exacted from Py-ART and included directly in the nexrad_level2.py module if so desired.

I believe this to be the best solution to this issue, although it is not exactly what was requested.

codypiersall commented 9 years ago

This is great! Thank you.

Cody

On Tue, Sep 1, 2015 at 11:21 AM, Jonathan J. Helmus < notifications@github.com> wrote:

Py-ART will now decompress gzip and bzip2 files passed to the read_nexrad_archive function. Adding this functionality directly to the NEXRADLevel2File and related classes would introduced undesired coupling in this module or require extensive repetition of code, but of which are bad. Py-ART does expose the prepare_for_read function in the pyart.io namespace which can be used to add this functionality to the NEXRADLevel2File or other classes. For example:

import pyartfrom pyart.io.nexrad_level2 import NEXRADLevel2File nfile = NEXRADLevel2File(pyart.io.prepare_for_read('KATX20130717_195021_V06.gz'))

Additionally the prepare_for_read function can be easily exacted from Py-ART and included directly in the nexrad_level2.py module if so desired.

I believe this to be the best solution to this issue, although it is not exactly what was requested.

— Reply to this email directly or view it on GitHub https://github.com/ARM-DOE/pyart/issues/341#issuecomment-136780508.