ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
407 stars 77 forks source link

Implement handling for "alternativeRowScanning" #296

Closed dopplershift closed 2 years ago

dopplershift commented 2 years ago

I'm not sure if this is the most elegant approach (honestly the index handling here was a bit tricky to follow), but it works. For the example from #295 :

from urllib.request import urlopen

import matplotlib.pyplot as plt
import xarray as xr

url = 'https://noaa-ndfd-pds.s3.amazonaws.com/opnl/AR.conus/VP.001-003/ds.temp.bin'
fname = 'forecast.bin'
with open(fname, 'wb') as cache_file:
    cache_file.write(urlopen(url).read())

ds = xr.open_dataset(fname, engine='cfgrib')
plt.imshow(ds.t2m[0])

this PR now gives the correct result:

image
FussyDuck commented 2 years ago

CLA assistant check
All committers have signed the CLA.

alexamici commented 2 years ago

@dopplershift thanks to track down the issue and find a solution!

I agree that this doesn't look particularly elegant, but I don't see any easier way to support alternativeRowScanning.

Changing the access to alternativeRowScanning to a .get("alternativeRowScanning", False) should fix the tests and then it is good to go in, IMO.

alexamici commented 2 years ago

If I understood correctly @shahramn noted in #295 that alternativeRowScanning is handled by ecCodes in recent versions, so probably it is better to not try and fix it in cfgrib.

@shahramn do you confirm that the change is not needed?

iainrussell commented 2 years ago

I'm not so sure that this can be fixed so easily in ecCodes - ecCodes has a geoiterator that gives back lat,lon,values and it looks correct to me. However, the issue here is that this scanning mode cannot really be represented in netCDF/xarray. The values array looks something like this if we write out the indexes for a very small example: 1 2 3 4 8 7 6 5 i.e. the array 'goes back on itself'. I suspect the ecCodes geoiterator may be giving back the coordinates as they appear in the values array, i.e. 1,2,3,4,5,6,7,8. So it could be that this can be changed to give back the coords in the order 1,2,3,4,8,7,6,5 in order to make it 'rectangular', but I have no idea how messy that might be.

On an adjacent note, I think the revised plot looks upside-down, no?

If I call ecCodes' grib_get_data to get the lat/lon/values and read and plot them with Metview, I get this, which matches the coastlines better:

dopplershift commented 2 years ago

I currently have eccodes 2.25.0 installed (from conda-forge).

Regarding "upside down", all I'm doing in these plots is doing an imshow(). I'm not doing any coordinate handling whatsoever. All I really care about (at the moment) is that with this patch the results match the reference image from pygrib in #295.

I think this also goes to @iainrussell 's point: while geoiterator might be working correctly and returning properly matches lon/lats, that's not always sufficient. For lat/lon in this case, because of the grid definition, 2D lat/lons are always necessary. But if I generated the projected coordinates for this (I think Lambert Conformal grid)--like I would NEED for xarray selection currently--the orthogonal 1D coordinates I would generate would only be appropriate for every other row.

So I guess while maybe with eccodes 2.25 the data aren't "wrong" so to speak, the currently returned array in cfgrib (before this PR) is unintuitive and for many of the most common use cases, severely suboptimal.

iainrussell commented 2 years ago

Hello @dopplershift,

OK, so we've had a look at this. I'm not sure it will help us here, but the next version of ecCodes (2.26.0) will contain the ability to 'undo' this alternativeRowScanning with a call to grib_set: grib_set -s swapScanningAlternativeRows=1 in.grib out.grib This should also work through the ecCodes Python API, so I wondered if this would have been a cleaner solution, but.... it doesn't look like we're allowed to set keys at this point in the code, so it did not work for me.

I think that for now we can go with what you originally proposed, but I'd suggest perhaps moving the line-slicing code into a function that takes the shape as an argument, just so that it is in only one place. Can I also ask for a test please, with a small data sample? A single field is enough, and one nice way of making GRIB files smaller is to reduce the bitsPerValue.

Cheers, Iain

dopplershift commented 2 years ago

Added a test that hopefully works. I stripped down the file with what I had handy and got it to a single message, but it's still 1MB. I'm not sure what tool exactly to use to cut that down further. Happy to incorporate if someone else can cut it down, or do it myself if you can point me on how exactly to do that.

iainrussell commented 2 years ago

Thanks @dopplershift ! Because that file used a complex packing type, I also was not able to reduce it further, but I was able to create a new small one with grid_simple packing with alternate row scanning.

codecov-commenter commented 2 years ago

Codecov Report

Merging #296 (b8a249d) into master (1351bbc) will increase coverage by 0.03%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   95.39%   95.43%   +0.03%     
==========================================
  Files          26       26              
  Lines        1978     1992      +14     
  Branches      231      232       +1     
==========================================
+ Hits         1887     1901      +14     
  Misses         61       61              
  Partials       30       30              
Impacted Files Coverage Δ
cfgrib/dataset.py 98.39% <100.00%> (+0.03%) :arrow_up:
tests/test_30_dataset.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1351bbc...b8a249d. Read the comment docs.

dopplershift commented 2 years ago

If you like, I can rebase to eliminate the bigger file from the commit history.

iainrussell commented 2 years ago

That would be great @dopplershift , thank you!

dopplershift commented 2 years ago

Done.

iainrussell commented 2 years ago

Thanks @dopplershift !