cuspaceflight / tawhiri

CUSF Landing Prediction Software
http://predict.habhub.org/
GNU General Public License v3.0
17 stars 37 forks source link

Downloader does not download: "Unexpected axes on record (latitudes)" #85

Closed rjw57 closed 10 years ago

rjw57 commented 10 years ago

When attempting to run the downloader I get the following errors:

$ tawhiri-download download -d ../data/tawhiri-data 2014102400
[2014-10-24 11:54:20,195] INFO tawhiri.downloader MainThread: downloader: opening files for dataset 2014-10-24 00:00:00
[2014-10-24 11:54:20,196] INFO tawhiri.dataset MainThread: Opening dataset 2014-10-24 00:00:00 ../data/tawhiri-data/download.VoD34z/2014102400 (truncate and write)
[2014-10-24 11:54:20,199] INFO tawhiri.downloader MainThread: download of 2014-10-24 00:00:00 starting
[2014-10-24 11:54:20,209] INFO tawhiri.downloader MainThread: Need to download 130 files
[2014-10-24 11:54:41,258] WARNING tawhiri.downloader.worker.1 MainThread: bad file (gfs.t00z.pgrb2f00, attempt 1), file sleep 120
Traceback (most recent call last):
  File "/home/zelda/rjw57/projects/misc/cusf-landing-predictor/tawhiri/tawhiri/download.py", line 659, in _run_queue_item
    self._unpack_file(temp_file, queue_item)
  File "/home/zelda/rjw57/projects/misc/cusf-landing-predictor/tawhiri/tawhiri/download.py", line 794, in _unpack_file
    callback=lambda a, b, c: sleep(0))
  File "/home/zelda/rjw57/projects/misc/cusf-landing-predictor/tawhiri/tawhiri/download.py", line 152, in unpack_grib
    assert_hour, file_checklist, callback)
  File "/home/zelda/rjw57/projects/misc/cusf-landing-predictor/tawhiri/tawhiri/download.py", line 200, in _check_grib_file
    _check_axes(record)
  File "/home/zelda/rjw57/projects/misc/cusf-landing-predictor/tawhiri/tawhiri/download.py", line 281, in _check_axes
    raise ValueError("unexpected axes on record (latitudes)")
BadFile: unexpected axes on record (latitudes)
^C[2014-10-24 11:54:48,935] INFO tawhiri.downloader MainThread: deleting failed download files
[2014-10-24 11:54:48,935] INFO tawhiri.dataset MainThread: Closing dataset 2014-10-24 00:00:00 ../data/tawhiri-data/download.VoD34z/2014102400
[2014-10-24 11:54:48,976] WARNING tawhiri.downloader MainThread: deleting ../data/tawhiri-data/download.VoD34z/2014102400
[2014-10-24 11:54:48,976] WARNING tawhiri.downloader MainThread: deleting ../data/tawhiri-data/download.VoD34z/2014102400.gribmirror
[2014-10-24 11:54:48,977] WARNING tawhiri.downloader MainThread: exit via KeyboardInterrupt

Are these expected? Poking in the source it would appear that the check is used to workaround/account for strange behaviour in PyGRIB? Manually patching the code to sump the latitude/longitude axes reveals that the axes in the GRIB are the reverse of those tested for.

rjw57 commented 10 years ago

Poking @danielrichman whom git blame identifies as the last person to touch that bit of the code.

danielrichman commented 10 years ago

Will be at a keyboard later. Perhaps https://github.com/cuspaceflight/tawhiri/issues/15 is the problem? On 24 Oct 2014 11:58, "Rich Wareham" notifications@github.com wrote:

Poking @danielrichman https://github.com/danielrichman whom git blame identifies as the last person to touch that bit of the code.

— Reply to this email directly or view it on GitHub https://github.com/cuspaceflight/tawhiri/issues/85#issuecomment-60372786 .

rjw57 commented 10 years ago

Ah. Yeah, that's a possible. Unfortunately only 1.9.9 is on PyPI. :disappointed:

I'll see if I can do a requirements.txt trick to get the right version.

danielrichman commented 10 years ago

It's also worth noting that PyGRIB is awful and this was written in n<=3 days because the OpeNDAP soln broke :P On 24 Oct 2014 12:11, "Rich Wareham" notifications@github.com wrote:

Ah. Yeah, that's a possible. Unfortunately only 1.9.9 is on PyPI. [image: :disappointed:]

I'll see if I can do a requirements.txt trick to get the right version.

— Reply to this email directly or view it on GitHub https://github.com/cuspaceflight/tawhiri/issues/85#issuecomment-60373813 .

rjw57 commented 10 years ago

I agree with the assessment of PyGRIB although I'd like to get the current version installed and working before hacking it to bits.

I got PyGRIB==1.9.6 installed via the changes in b550ec113a39a4e4b36e3610ea826563e6b99834. Same result :(. If it's not expected behaviour and is non-trivial to fix, I might just go right to hacking on my own downloader.

danielrichman commented 10 years ago

Known working:

ii  libgrib-api-1.9.9                1.9.9-3                             GRIB decoding/encoding software library
ii  libgrib-api-dev                  1.9.9-3                             GRIB decoding/encoding software library (development)
$ /srv/tawhiri2/bin/pip freeze
pygrib==1.9.6
pyproj==1.9.3
rjw57 commented 10 years ago

Ah... I have libgrib-api 1.10.4. That might be it...

rjw57 commented 10 years ago

In any case, my original question has been answered and so I'm closing this for the moment.

danielrichman commented 10 years ago

If this does not fix things, I can dig further. Last time similar changes happened it was very easy to fix. They just changed some ordering or rubbish inside the PyGRIB code. Starting from scratch seems overkill, not least because the GRIB operations are actually a fairly small part of the downloader.

The other concern is that there are a lot of subtle quirks of the downloading; error cases to handle etc. that have been added to the downloader over time, or might be exposed in a new implementation. This thing is super stable; it would have an uptime of >year had I not accidentally restarted it once and/or switched to FTP (which was just motivated by speed).

For example, the last time a problem was discovered in the downloader was Aug 2013, f56547241032a1c9f9457b79140a511ea939f4d5, when we discovered that occasionally you get a 50x from the server which you can retry in a few seconds without issue. The last change more than just a few lines was July '13.

rjw57 commented 10 years ago

:+1: for keeping stable code. I'll try and get the original working before hacking anything.

rjw57 commented 10 years ago

This thing is super stable; it would have an uptime of >year had I not accidentally restarted it once and/or switched to FTP (which was just motivated by speed).

Was HTTP particularly slow? On my machine, the HTTP downloads are pretty speedy.

danielrichman commented 10 years ago

It varies. There was a period where the NOAA appeard to either be overloaded, or were throttling HTTP downloads. We were getting ~1MB/s, in contrast to FTP which—again at the time—was producing consistent 10MB/s.

This doesn't actually appear to be the case any more (though currently the downloads are fast enough that we can keep up with the rate at which the files are pushed to the FTP server).

danielrichman commented 10 years ago

It is unclear to me whether we influence these speeds, and/or swapping to FTP changed things. I am not aware of anyone else running a tawhiri downloader, but then I don't know whether I would have reason to be. The NOAA could be reacting to just us, or they could be reacting to N people running the downloader hitting the same files at exactly the same time, where "reacting" is intentional throttling to regulate load, or involuntary "throttling" due to congestion.

Or it could be completely unrelated and I'm imagining it.

rjw57 commented 10 years ago

Perhaps if one had one's time over again, one could use csync or some appropriate magic wget configuration to first form a local (partial) mirror and then work on it.

rjw57 commented 10 years ago

Forcing a downgrade of libgrib-api to 1.9.9 has fixed the problem. Mucky... :cry:

Should be ok for Travis, thoguh, since 1.9.9 is the version in Ubuntu precise which is what their workers use.

danielrichman commented 10 years ago

I disagree: most of the downloader is handling the way the files are released and reliably downloading them without getting stuck, sensible timeouts, etc.

I don't thing wget would be able to do that, or if you were to write a script that could, you'd mostly end up re-inventing the downloader.

See also: the "gribmirror" files produced by the download.

rjw57 commented 10 years ago

I disagree: most of the downloader is handling the way the files are released and reliably downloading them without getting stuck, sensible timeouts, etc.

There's a lot of magic in wget/cURL in this regard too. One can go a bit mad with timeouts and backoffs, etc. but I was only speculating. You'd end up re-writing the downloader as cURL configuration.

Nonetheless, I'm not proposing changing the downloader. :smile:

danielrichman commented 10 years ago

Some random ramblings on the downloader

rjw57 commented 10 years ago

:+1: to much of this. All the ruaumoko work was me getting up to speed with everything and I certainly don't intend to spend my life trying to mock all of the GFS. The strategy I was thinking of re testing would be to mock Dataset in the test suite to return some plausible-ish data and test from that level up.

I'd rather spend my time writing features than code.

rjw57 commented 10 years ago

I probably would choose a language other than Python. Motivations: Python's green threading support is poor (I skimmed asyncio, which looks nice, but is Py3 only and could have warts, idk), and doing lots of BLITting or GRIB parsing (if we write one) is slow.

Yeah, a standalone downloader might make a nice little side-project for someone. Go's not a bad choice. It's pretty good on the concurrency side and has a fairly well thought out standard networking library.

rjw57 commented 10 years ago

... this assumes the on-disk format is essentially completely frozen.

danielrichman commented 10 years ago

I don't see on-disk format radically changing. Re-ordering the axes, say, is infeasible: it's just not possible to blit that much data.

We actually tried Go, not for the downloader, but for other bits. It's one of the branches in here https://github.com/danielrichman/tawhiri-codegolf (Incidentally, it occurred to me that you may not have found https://github.com/cuspaceflight/tawhiri-tools and https://github.com/danielrichman/tawhiri-notes.)

Go suffers from the unfortunate limitation that arrays can't be bigger than 2GB (or 4GB, I forget). We could work around this, yes, but this leads to really dumb stuff like this https://github.com/danielrichman/tawhiri-codegolf/blob/go-poc/tawhiri/dataset/mmap.go where I'm redefining the built in memmap because Go's one won't let you map anything big (because you wouldn't be able to cast it to an array without bad things happening, I guess).