Open gewitterblitz opened 4 years ago
Hi,
I just tried it locally, the file opens fine for me.
I'm guessing something went wrong during your download. We could find out with a checksum:
import hashlib
def md5_hash(path: str) -> str:
with open(path, "rb") as f:
content = f.read()
return hashlib.md5(content).hexdigest()
print(md5_hash("SanDiego.nc"))
# prints 1ed883e7318883ef654c123106ed09c0
Did you already try re-downloading the netCDF file? (Lame suggestion I know, but sometimes the solution is mundane!)
(By the way, ordinary URLs will not work with the netCDF4 library, only an OPeNDAP URL will work: https://www.opendap.org/)
Works for me, too.
I think your file got corrupted, or your netcdf lib is somehow broken.
You might try running ncdump on the file to test it out:
$ ncdump -h SanDiego.nc
-CHB
Huite and Chris,
Yes, I did try redownloading the file but still getting the same error. The checksum changes every time I download the file.
Here is how I am downloading it within my jupyter notebook on university cluster (although same result on local machine):
! wget https://github.com/erdc/AdhModel/blob/master/tests/test_files/SanDiego/SanDiego.nc
which gives the following output:
wget: /apps/spack/rice/apps/anaconda/5.3.1-py37-gcc-4.8.5-7vvmykn/lib/libuuid.so.1: no version information available (required by wget)
--2020-08-07 02:10:58-- https://github.com/erdc/AdhModel/blob/master/tests/test_files/SanDiego/SanDiego.nc
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘SanDiego.nc’
[ <=> ] 81,848 --.-K/s in 0.05s
2020-08-07 02:10:59 (1.68 MB/s) - ‘SanDiego.nc’ saved [81848]
Here is the checksum code output:
import hashlib
def md5_hash(path: str) -> str:
with open(path, "rb") as f:
content = f.read()
return hashlib.md5(content).hexdigest()
print(md5_hash("SanDiego.nc"))
# prints 4ab9b0548e4ed555970d71ee2238a5c2
And here is the same error using the example script:
from datetime import datetime, timedelta
import gridded
import netCDF4
with netCDF4.Dataset("SanDiego.nc") as nc:
# need to convert to zero-indexing
nodes = nc.variables['nodes'][:] - 1
faces = nc.variables['E3T'][:, :3] - 1
# make the grid
# gridded.grids.Grid_U
grid = gridded.grids.Grid_U(nodes=nodes,
faces=faces,
)
# make the time object (handles time interpolation, etc)
times_var = nc.variables['times'][:]
# Time axis needs to be a list of datetime objects.
# If the meta data are not there in the netcdf file, you have to do it by hand.
start = datetime(2019, 1, 1, 12)
times = [start + timedelta(seconds=val) for val in times_var]
# This isn't a compliant file, so this will not work.
# time_obj = gridded.time.Time.from_netCDF(dataset=nc,
# varname='times')
time_obj = gridded.time.Time(data=times,
filename=None,
varname=None,
tz_offset=None,
origin=None,
displacement=timedelta(seconds=0),)
# make the variables
depth = nc.variables['Depth']
@ChrisBarker-NOAA : ncdump does not work work ncdump: SanDiego.nc: SanDiego.nc: NetCDF: Unknown file format
Am I downloading from the right weblink?
@Huite : Good to know the OPEnDAP trick.
Ah, you were using wget
, so yes you're on the right track: you're not downloading what you think you are downloading, because of how Github works. You've been getting HTML pages instead of the netCDF file (open your SanDiego.nc with an editor to see for yourself).
See: https://unix.stackexchange.com/questions/228412/how-to-wget-a-github-file
This'll do the trick:
wget https://github.com/erdc/AdhModel/blob/master/tests/test_files/SanDiego/SanDiego.nc?raw=true -O SanDiego.nc
(I didn't know this either before now.)
You're probably on a *nix machine (if you're using wget), so you can check immediately:
md5sum SanDiego.nc
I got a little curious, you can get the data into Python without loading into a file, provided you once again give the right URL: https://github.com/Unidata/netcdf4-python/issues/295
import requests
import netCDF4
my_url = "https://github.com/erdc/AdhModel/blob/master/tests/test_files/SanDiego/SanDiego.nc?raw=true"
response = requests.get(my_url, stream=True)
ds = netCDF4.Dataset('name', mode='r', memory=response.content)
OPenDAP is probably lot fancier, but it's pretty cool this works.
Xarray dispatches based on the type you're passing in, in which case you need h5py
as an available backend to read from memory, and provide a BytesIO
object.
import io
import requests
import xarray as xr
response = requests.get("https://github.com/erdc/AdhModel/blob/master/tests/test_files/SanDiego/SanDiego.nc?raw=true")
ds = xr.open_dataset(io.BytesIO(response.content))
@Huite: thanks for beeting me to it!
and the trick about xarray -- another good reason to build the next version of gridded on it :-)
-CHB
@Huite : Thank you so much, it works!!!
I had no idea about the wget issue with github files. Your suggestion for loading through url is really helpful and works just great!
I am currently trying out gridded for post-processing the output from an atmospheric NWP model. Will let you guys know if I need any help.
Any words of wisdom from your end are highly appreciated. I found out about gridded from @ChrisBarker-NOAA's AMS 2017 talk.
@ChrisBarker-NOAA @Huite : What's the best way to reach out to you to discuss gridded's application to a meteorological numerical model output?
Actually gitHub is a pretty good way to do it.
Why not start a new issue(s) with a question or proposal?
Hi I am new to gridded. I was trying to replicate the load_arbitrary_ugrid.py example script but could not load the SanDiego.nc file. I tried downloading the file to the working directory as well as accessing directly through the provided url using both netCDF4 and xarray libraries.
Approach 1:
with netCDF4.Dataset("SanDiego.nc") as nc:
OSError Traceback (most recent call last)