Downloading a full year of HRRR srf variables

NicWayand commented 7 years ago

Hi @blaylockbk,

First, thanks for providing this archive and download scripts.

I am trying to download one year of HRRR srf variables. I realize this is a huge amount of data, but I will be sub-setting the domain down to a ~10x10 grid cell region, and sub-setting to ~10 srf variables.

I can successfully use HRRR_S3.py to grab srf variables for one day (+24 h forecasts from 00). But when I try to grab multiple initialization times (i.e. 00, 06, 12, 18) and only the 8 hour forecast times, the script hangs, without giving any error. (it just sits at a random download % and htop shows nothing is happening).

I have also tried your HRRR_S3_fastDwnld.py script, but get an error: URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

I was hoping to modify the scripts to either loop over days (HRRR_S3.py) or variables (HRRR_S3_fastDwnld.py), but maybe I am taking the wrong approach.

Appreciate your suggestions!

blaylockbk commented 7 years ago

Hi Nic,

I'm glad you are finding the HRRR archive useful.

I'm not sure why the HRRR_S3.py script hangings, and I think the URLError: has to do with the version of urllib you are using. I re-wrote a download script that I hope will simplify things for you and others:
https://github.com/blaylockbk/pyBKB_v2/blob/master/demos/hrrr_variable_from_pando.py
All I did was reorganized the loops

I'd suggest reading all the comments so you can tailor the function parameters for your needs. Let me know if you still get the URLError, and I'll try to find how I got around that earlier.

From what it sounds like you are trying to download, you might try running the main download function like this...

# Download multiple variables from date range
sDATE = date(2016, 1, 1)   # Start date
eDATE = date(2017, 1, 1)   # End date (exclusive)
days = (eDATE-sDATE).days
DATES = [sDATE + timedelta(days=d) for d in range(days)]

# Variable strings must be part of a line in the .idx file
variables = ['TMP:2 m', 'DPT:2 m', 'APCP:surface']

for variable in variables:
    for DATE in DATES:
        # This is the main download function. We are
        # simply looping over all the days and variables
        # we want to get.
        download_HRRR_variable_from_pando(DATE, variable,
                                          hours=[0, 6, 12, 18],
                                          fxx=[8],
                                          model='hrrr',
                                          field='sfc',
                                          outdir='./')

NicWayand commented 7 years ago

Thank you for the updated function! But when I run the example you provided I get multiple: ERROR!!! Does the .idx file exist: https://api.mesowest.utah.edu/archive/HRRR/oper/sfc/20170310/hrrr.t18z.wrfsfcf00.grib2.idx

Which is strange, because the file does exist at that url.

Maybe its an issue with my urllib2?? I am using python 2.7.12.

EDIT: Yes the underlying issue is still a URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed Will try to figure out why.

NicWayand commented 7 years ago

OK, so this hack worked for future reference:

            # 1) Open the Metadata URL and read the lines
            try:
                # Hack to not certify (https://stackoverflow.com/questions/19268548/python-ignore-certicate-validation-urllib2/28048260#28048260)
                ctx = ssl.create_default_context()
                ctx.check_hostname = False
                ctx.verify_mode = ssl.CERT_NONE
                idxpage = urllib2.urlopen(idxfile, context=ctx)
                lines = idxpage.readlines()
            except:
                print "\n   ERROR!!! Does the .idx file exist: %s \n" % idxfile
                continue

NicWayand commented 7 years ago

Closing issue as it was urllib2 bug on my end.

blaylockbk commented 7 years ago

ah, yes. That was the same solution I've used before. Good to hear you got it working. Let me know if you need any other help.

blaylockbk / pyBKB_v2

Downloading a full year of HRRR srf variables #1