Open sgdecker opened 6 years ago
So I ran your code just now and got a complete image with no missing data. It's possible the data file on the server just wasn't complete. Was this the most recent file when you tried?
Originally, it was the most recent file, but I just reran the code and got 44037412 missing pixels.
When I try the code on a different machine (with 1.3.1), I see the same as you: No missing data.
I tried creating a new barebones conda environment (brings in netCDF4 1.4.0 by default) and I still got missing pixels. Then I tried an environment with netCDF4 1.3.1, since that worked on the other machine, and, lo and behold, there are no missing data!
Which version of netCDF4 are you running? I could try 1.4.1, which I see has been released, I guess through a manual install, if that would be useful.
My results were with 1.4.1 from conda-forge on my mac.
Works for me with 1.4.0 on Python 3.6
OK, I just tried a conda-forge-based environment with 1.4.1, and I am getting missing pixels. Here is the output of conda list
for this environment:
# packages in environment at /home/decker/local/anaconda3/envs/nctest3:
#
# Name Version Build Channel
blas 1.1 openblas conda-forge
bzip2 1.0.6 h470a237_2 conda-forge
ca-certificates 2018.8.13 ha4d7672_0 conda-forge
certifi 2018.8.13 py36_0 conda-forge
cftime 1.0.1 py36h7eb728f_0 conda-forge
curl 7.61.0 h93b3f91_1 conda-forge
cycler 0.10.0 py_1 conda-forge
dbus 1.13.0 h3a4f0e9_0 conda-forge
expat 2.2.5 hfc679d8_1 conda-forge
fontconfig 2.13.0 h65d0f4c_5 conda-forge
freetype 2.9.1 h6debe1e_1 conda-forge
gettext 0.19.8.1 0 conda-forge
glib 2.55.0 h464dc38_2 conda-forge
gst-plugins-base 1.12.5 hde13a9d_0 conda-forge
gstreamer 1.12.5 h61a6719_0 conda-forge
hdf4 4.2.13 0 conda-forge
hdf5 1.10.2 hc401514_1 conda-forge
icu 58.2 hfc679d8_0 conda-forge
jpeg 9c h470a237_0 conda-forge
kiwisolver 1.0.1 py36h2d50403_2 conda-forge
krb5 1.14.6 0 conda-forge
libffi 3.2.1 hfc679d8_4 conda-forge
libgcc-ng 7.2.0 hdf63c60_3 conda-forge
libgfortran 3.0.0 1 conda-forge
libiconv 1.15 h470a237_2 conda-forge
libnetcdf 4.6.1 he6cff42_7 conda-forge
libpng 1.6.35 ha92aebf_0 conda-forge
libssh2 1.8.0 h5b517e9_2 conda-forge
libstdcxx-ng 7.2.0 hdf63c60_3 conda-forge
libuuid 2.32.1 h470a237_0 conda-forge
libxcb 1.13 h470a237_2 conda-forge
libxml2 2.9.8 h422b904_3 conda-forge
matplotlib 2.2.3 py36h8e2386c_0 conda-forge
ncurses 6.1 hfc679d8_1 conda-forge
netcdf4 1.4.1 py36h62672b6_0 conda-forge
numpy 1.15.1 py36_blas_openblashd3ea46f_0 [blas_openblas] conda-forge
openblas 0.2.20 8 conda-forge
openssl 1.0.2o h470a237_1 conda-forge
pcre 8.41 h470a237_2 conda-forge
pip 18.0 py36_1 conda-forge
pthread-stubs 0.4 h470a237_1 conda-forge
pyparsing 2.2.0 py_1 conda-forge
pyqt 5.6.0 py36h8210e8a_6 conda-forge
python 3.6.6 h5001a0f_0 conda-forge
python-dateutil 2.7.3 py_0 conda-forge
pytz 2018.5 py_0 conda-forge
qt 5.6.2 hf70d934_9 conda-forge
readline 7.0 haf1bffa_1 conda-forge
setuptools 40.0.0 py36_1 conda-forge
sip 4.18 py36_1 conda-forge
six 1.11.0 py36_1 conda-forge
sqlite 3.24.0 h2f33b56_0 conda-forge
tk 8.6.8 0 conda-forge
tornado 5.1 py36h470a237_1 conda-forge
wheel 0.31.1 py36_1 conda-forge
xorg-libxau 1.0.8 h470a237_6 conda-forge
xorg-libxdmcp 1.1.2 h470a237_7 conda-forge
xz 5.2.4 h470a237_1 conda-forge
zlib 1.2.11 h470a237_3 conda-forge
The environment that works on my machine:
# packages in environment at /home/decker/local/anaconda3/envs/nctest2:
#
# Name Version Build Channel
blas 1.0 mkl
bzip2 1.0.6 h14c3975_5
ca-certificates 2018.03.07 0
certifi 2018.8.13 py36_0
curl 7.61.0 h84994c4_0
cycler 0.10.0 py36_0
dbus 1.13.2 h714fa37_1
expat 2.2.5 he0dffb1_0
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_0
glib 2.56.1 h000015b_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
hdf4 4.2.13 h3ca952b_2
hdf5 1.10.1 h9caa474_1
icu 58.2 h9c2bf20_1
intel-openmp 2018.0.3 0
jpeg 9b h024ee3a_2
kiwisolver 1.0.1 py36hf484d3e_0
libcurl 7.61.0 h1ad7b7a_0
libedit 3.1.20170329 h6b74fdf_2
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libnetcdf 4.5.0 hc9a163a_7
libpng 1.6.34 hb9fc6fc_0
libssh2 1.8.0 h9cfc8f7_4
libstdcxx-ng 8.2.0 hdf63c60_1
libuuid 1.0.3 h1bed415_2
libxcb 1.13 h1bed415_1
libxml2 2.9.8 h26e45fe_1
matplotlib 2.2.3 py36hb69df0a_0
mkl 2018.0.3 1
mkl_fft 1.0.4 py36h4414c95_1
mkl_random 1.0.1 py36h4414c95_1
ncurses 6.1 hf484d3e_0
netcdf4 1.3.1 py36h73d701b_2
numpy 1.15.0 py36h1b885b7_0
numpy-base 1.15.0 py36h3dfced4_0
openssl 1.0.2p h14c3975_0
pcre 8.42 h439df22_0
pip 10.0.1 py36_0
pyparsing 2.2.0 py36_1
pyqt 5.9.2 py36h22d08a2_0
python 3.6.6 hc3d631a_0
python-dateutil 2.7.3 py36_0
pytz 2018.5 py36_0
qt 5.9.6 h52aff34_0
readline 7.0 ha6073c6_4
setuptools 40.0.0 py36_0
sip 4.19.8 py36hf484d3e_0
six 1.11.0 py36_1
sqlite 3.24.0 h84994c4_0
tk 8.6.7 hc745277_3
tornado 5.1 py36h14c3975_0
wheel 0.31.1 py36_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 ha838bed_2
I don't have a Mac handy, but trying a Windows 10 machine (the other tests were on CentOS 7 machines) makes matters even more confusing: I have missing pixels with both 1.4.1 and 1.3.1.
As a final test for now, I took the same Windows 10 machine home, so it's on my home network rather than the campus network, and there was no change in the results.
So we have all this data...and I have no clue what conclusion to draw from it. 😆
My conclusion: Use a Mac!
Interestingly, I found a different conda environment on CentOS 7 that has missing pixels even with 1.3.1, but it has libnetcdf 4.6.1 (in common with my other broken environment). So my current working hypothesis is that the Python module isn't the issue, libnetcdf is. 4.5.0 is OK (on CentOS anyway), but 4.6.1 is not.
The combination netcdf4 1.3.1 and libnetcdf 4.5.0 also works on Windows 10. It appears it is libnetcdf 4.6.1 that is at fault.
Unfortunately for that theory (which might not be completely wrong), 4.6.1 is working fine for me here.
What happens if instead of ds.variables['Rad'][:,:]
you plot ds.variables['Rad'][::4, ::4]
? That will download 1/16 of the data.
OK, I made that tweak. Running the revised program on CentOS with 1.4.1/4.6.1 results in a hang. Even Control-C didn't give me a prompt back, but using kill
did.
I ran the program a second time, and after 14 minutes, it finally crashes with:
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <!DOCTYPE^ HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>503 Service Unavailable</title></head><body><h1>Service Unavailable</h1><p>The server is temporarily unable to service yourrequest due to maintenance downtime or capacityproblems. Please try again later.</p></body></html>
Traceback (most recent call last):
File "sat_data_missing_2.py", line 11, in <module>
values = ds.variables['Rad'][::4,::4]
File "netCDF4/_netCDF4.pyx", line 4105, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 5016, in netCDF4._netCDF4.Variable._get
File "netCDF4/_netCDF4.pyx", line 1745, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP server error
However, I see the server is indeed down, so the above may be a red herring.
Now that the server is back up, I can report the program hangs on Windows 10 with 1.4.1/4.6.1 as well.
@sgdecker if I just read this issue quickly and it seems that the issues are all on Windows, right? I'll try to get a Windows machine and test it.
Note that netcdf 4.6.1
does not build properly on Windows* so we had to add tons of patches and they may be the cause of the problems you are experiencing.
* We had issue building it for Python 2.7, backporting patches that fixed OPeNDAP problems, etc.
@ocefpaf no I am also having the issue on CentOS, which is exhibiting the same behavior for me as Windows 10: 1.3.1/4.5.0 is fine, but 1.3.1/4.6.1 and 1.4.1/4.6.1 are not.
I think this might help us narrow this problem down.
I have two different linux machines, each with identical versions of netcdf4
, libnetcdf
and curl
from conda-forge
:
# packages in environment at /opt/conda:
#
# Name Version Build Channel
libnetcdf 4.6.1 h5e45101_3 conda-forge
netcdf4 1.4.0 py36h62672b6_1 conda-forge
curl 7.61.0 h93b3f91_1 conda-forge
One works fine, the other displays the problem! https://gist.github.com/rsignell-usgs/f3e4452b0bd7445178291e0551aa7bea
@DennisHeimbigner does netcdf4 access of OPeNDAP URLs depend on any other packages or could there be some difference caused by internet connectivity/timeout?
I tried setting HTTP.TIMEOUT=25
in both ~/.daprc
and ~/.dodsrc
per this conversation but it didn't seem to have any impact.
I just got my hands on a Mac, and, contrary to @dopplershift 's experience, I am seeing the same behavior (with the original test program) as with the other machines I've tried: 1.4.1/4.6.1 has missing data, but 1.3.1/4.5.0 is fine.
This makes me think so kind of network issue is involved--I'm cheating because I'm sitting next to the data. 😁
I will say that I do see a problem with the slicing I suggested. It's not technically a hang; it just took > 45 minutes to error out with an IndexError
. Which is nuts because with the plain slices it took < 45 minutes and completed with out error (or missing data) for me. I'm pretty sure that's now how server-side subsetting is supposed to work. 😳
Addendum: Oh W.T.F. 😡 (cc @lesserwhirls @jrleeman )
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7904] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7908] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7912] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7916] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7920] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7924] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7928] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7932] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7936] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7940] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7944] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7948] HTTP/1.1" 200 220 "-" "oc4.6.1" 4
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7952] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7956] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7960] HTTP/1.1" 200 220 "-" "oc4.6.1" 5
165.230.171.64 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[192][4700] HTTP/1.1" 200 220 "-" "oc4.5.0" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7964] HTTP/1.1" 200 220 "-" "oc4.6.1" 3
128.117.144.149 - - [23/Aug/2018:13:11:52 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad[628][7968] HTTP/1.1" 200 220 "-" "oc4.6.1" 10
The client is requesting individual grid points, so it's making a round-trip request to download. 3. bytes. 😱 That seems...like something that should be improved.
This makes me think so kind of network issue is involved
I just tested many versions of the conda
package and pip
and I could not figure out a pattern besides that, when I get the whole data it takes a while but when I get the incomplete version it is very fast. So yeah, it may be network related, but why some of the conda
versions are "OK" returning the zeroes instead of keep waiting for the data puzzles me. Maybe something changed in latest curl
, I'll keep digging...
Is someone in a position to run a packet sniffer on this. In particular, to look for a dropped connection? This may be related to this https://github.com/Unidata/netcdf-c/issues/1074
If @sgdecker (or anyone seeing the missing data) can send me their IP address (here or email), I can look at the server logs and see if I see anything.
I should be able to run tcpdump
on my machine, but I've never used it before, so I don't know how exactly I should invoke it to get the data that would be useful. If that is what you have in mind, and there is a recipe I could follow, I can try it out.
My CentOS machine is 165.230.171.64
Sorry, I have not used a packet sniffer in a long time, so I can't help.
Opened #838 for issue about the strided access taking forever.
So, I've been able to reproduce on my system without changing my environment, by running from my home (even though I've got a 1Gb/s connection). So here's a log of my opendap download attempts:
8.42.77.33 - - [23/Aug/2018:17:39:00 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 119494799 "-" "oc4.6.1" 10010
8.42.77.33 - - [23/Aug/2018:17:43:44 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 13751
8.42.77.33 - - [23/Aug/2018:17:47:28 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 232728623 "-" "oc4.6.1" 10016
8.42.77.33 - - [23/Aug/2018:17:51:18 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 9032
8.42.77.33 - - [23/Aug/2018:17:51:39 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 8587
8.42.77.33 - - [23/Aug/2018:17:55:53 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.6.1" 8549
8.42.77.33 - - [23/Aug/2018:17:56:13 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 193928279 "-" "oc4.6.1" 10005
8.42.77.33 - - [23/Aug/2018:18:00:16 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 233694335 "-" "oc4.6.1" 10012
The interesting parts to glean are that the request should return 240000223 bytes. For netcdf 4.5.0, this amount of data is always returned. For 4.6.1, this amount only seems to succeed if the time of the connection is less than, it seems, 10000 (ms?). This time effect is even more prevalent on @sgdecker 's downloads:
165.230.171.64 - - [23/Aug/2018:09:54:51 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 54211031 "-" "oc4.6.1" 10000
165.230.171.64 - - [23/Aug/2018:09:55:44 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 27760
165.230.171.64 - - [23/Aug/2018:10:09:41 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 240000223 "-" "oc4.5.0" 28281
165.230.171.64 - - [23/Aug/2018:10:29:06 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 84729167 "-" "oc4.6.1" 9999
165.230.171.64 - - [23/Aug/2018:13:39:18 -0600] "GET /thredds/dodsC/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/OR_ABI-L1b-RadC-M3C02_G16_s20182341637323_e20182341640096_c20182341640138.nc.dods?Rad.Rad HTTP/1.1" 200 61339295 "-" "oc4.6.1" 10000
Here we see with 4.6.1, his connections always stop right around 10000, whereas with 4.5.0 it lasts until 20k, and the correct amount of data is returned. On my machine, I can confirm 4.6.1 is problematic both from conda-forge and Homebrew. 4.5.1 seems to have no problem. This is in a conda environment where the only thing changing is libnetcdf (no changes to netcdf4-python or libcurl).
@DennisHeimbigner did something about opendap/curl change in 4.6.1? (cc @WardF )
I do not think so, but I will have to review the pull requests. We recently saw an effect when switching the Tomcat version for url escape characters. Is is possible that there is some new timeout. Are the Tomcat/Apache version in these tests the same?
@DennisHeimbigner This is all using the same running TDS 5.0 instance on http://thredds-test.unidata.ucar.edu .
Maybe a related problem, maybe different, but the following (as a script):
from netCDF4 import Dataset
d = Dataset('http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/monthly_v2')
u = d.variables['uwnd'][:]
consistently fails for me with conda-forge libnetcdf 4.6.0 and 4.6.1, but not with 4.5.0-3. Output is:
(nc3) ~/Desktop $ python CCMP_failure.py
Traceback (most recent call last):
File "CCMP_failure.py", line 3, in <module>
u = d.variables['uwnd'][:]
File "netCDF4/_netCDF4.pyx", line 4060, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 4950, in netCDF4._netCDF4.Variable._get
File "netCDF4/_netCDF4.pyx", line 1733, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP failure
This is with a Mac and with Linux. With the URL above, the request works for some time slices, but not for others. With a related URL, I can't even get a single time slice; the following fails:
from netCDF4 import Dataset
d = Dataset('http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2')
print(d.variables['uwnd'].shape)
u = d.variables['uwnd'][0]
@dopplershift at this point I'm inclined to revert all of conda-forge's builds to libnetcdf
4.5.0 :-(
It will be quite painful to re-build everything but at least users will get a working environment again. (Note that wheels are still building with 4.4.x, due to Python 2.7 on Windows.)
Can someone try this experiment. Modify the url to look like this: "[log][show=fetch]http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2" This assumes that python does not attempt to do anything with the url other than pass it to nc_open.
(py36) ~/Desktop $ python CCMP2.py
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.008 secs
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.das
Log:fetch complete: 0.002 secs
Log:prefetchable: lat=628
Log:prefetchable: lon=1440
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.002 secs
(42212, 628, 1440)
Log:prefetch: lat
Log:prefetch: lon
Log:prefetch.final: lat*[0:627/628],lon*[0:1439/1440]
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?lat,lon
Log:fetch complete: 0.003 secs
Log:prefetch.complete
Log:prefetch.vars: lat lon
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?uwnd.uwnd[0][0:627][0:1439]
Log:curl error: Timeout was reached
Log:fetch complete: 10.004 secs
Traceback (most recent call last):
File "CCMP2.py", line 6, in <module>
u = d.variables['uwnd'][0]
File "netCDF4/_netCDF4.pyx", line 4105, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 5016, in netCDF4._netCDF4.Variable._get
File "netCDF4/_netCDF4.pyx", line 1745, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: DAP failure
(py36) ~/Desktop $
There may be two problems here.
Let me suggest another experiment. WRT #1. create a file in your home directory called .daprc (or .dodsrc) with this line in it: HTTP.TIMEOUT=time where time is the timeout in seconds. Zero is supposed to mean never time out, but I would suggest using an actual large integer such as 1000.
Meanwhile, I will see if I am the one setting the timeout to 10 seconds.
OK, now it looks like I am waiting for 1000 s to elapse.
That is odd. It implies that the server is taking an very long time to complete its response. I suppose one could set the HTTP.TIMEOUT value to something horrific like 100000 (seconds) to try to see if the server ever finishes its response.
In any case, I do set the default timeout to 10 seconds. Should I change that to zero so that the connection will never timeout?
On the other hand, the uwnd variable is 152692623360 bytes (~152 gig), so I guess it should take a long time to download.
As an aside, I have a fix in to allow the setting of the curl download buffer size (currently something like 16k? by default) It will allow setting it to is maximum of 512k. This will probably help some, but with such a large variable, its effect may be limited.
But I was downloading only one time slice, so it should have been quick.
It took over 500 s. It looks like at least part of the problem here is server-side. I'm trying this url now with the earlier libnetcdf, and it is still taking a long time. I'm in contact with the people running the server, so we can look into that. Was the default timeout changed for 4.6?
py36) ~/Desktop $ python CCMP2.py
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.045 secs
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.das
Log:fetch complete: 0.006 secs
Log:prefetchable: lat=628
Log:prefetchable: lon=1440
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dds
Log:fetch complete: 0.005 secs
(42212, 628, 1440)
Log:prefetch: lat
Log:prefetch: lon
Log:prefetch.final: lat*[0:627/628],lon*[0:1439/1440]
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?lat,lon
Log:fetch complete: 0.012 secs
Log:prefetch.complete
Log:prefetch.vars: lat lon
Log:fetch: http://apdrc.soest.hawaii.edu:80/dods/public_data/satellite_product/CCMP/6hourly_v2.dods?uwnd.uwnd[0][0:627][0:1439]
Log:fetch complete: 578.212 secs
As you can see above, the download was only 628x1440 floats, so unless they had to be collected from a huge number of files it should have been very fast. I would be surprised if a single time slice is not in a single file.
I'm beginning to think that the problem I originally reported might be an interaction between the timeout value and a badly-behaving (or at least oddly-behaving) server, since the test above took slightly longer (611 s) just now with libnetcdf 4.4.1.1.
@DennisHeimbigner I'm not sure about @efiring 's server, since it looks like it's a straight DODS server, but the original problem was from our own TDS running on http://thredds.ucar.edu. I would say that 10 seconds is way too quick. If that's the time the connection takes to finish, that's going to have to allow for download time. Speaking as someone on hotel wifi right now, we cannot be assuming that users have a performant network connection.
So if that means setting it to something large like 1800 (half an hour) or just disabling it altogether, I leave that up to the netCDF team or other interested parties to decide. But I can say that with netCDF 4.5, the connections were finishing in 15-25 seconds, so the new 10 seconds is (relative to real-world) ridiculously short.
I am going to set the default timeout to a much larger value. I dislike setting it to infinity. Probably something like 1/2 hour should cover essentially all cases.
The slow response time from the server is baffling. Some possible causes (all unlikely):
@DennisHeimbigner while you fix this upstream I'm looking for a short term solution to the users of the packages we are building on conda-forge
. My options are:
netcdf 4.4.x
, which is the last version that "just works" and we have packages on Windows and *nix;4.5.x
and Windows with 4.4.x
so we don't break nco
users that need 4.5.x
features on nix;4.6.1
with a higher timeout value.Both 1 and 2 are a lot of work and will take some time. So my question is: is there a way to do 3 with a flag or something? Or, if you point me to a PR when you address it, I can try to backport it.
You also can use .daprc to set the timeout as I indicated in a previous comment. If you want to modify the code to use a longer timeout, then do this: In libdispatch/dauth.c, about line 33, change "HTTP.TIMEOUT","10", /seconds / to "HTTP.TIMEOUT","1800", /seconds / That will give a 30minute timeout and is the value I plan to change it to (unless you have a preferable value). If you change the value to zero, then the timeout should be infinite, but IMO that is not a good idea. Let me know what you choose.
You also can use .daprc to set the timeout as I indicated in a previous comment.
My guess is that a .daprc
is something for the user space, no?
If you want to modify the code to use a longer timeout, then do this: In
libdispatch/dauth.c
, about line 33, change"HTTP.TIMEOUT","10", /*seconds */
to"HTTP.TIMEOUT","1800", /*seconds */
I'll give that a try. Thanks.
My guess is that a .daprc is something for the user space, no?
Yes, each user would have to set this up.
The dataset mentioned in https://github.com/Unidata/netcdf4-python/issues/836#issue-353050569 seems to be gone but I tested the longer time out PR with @efiring's example and, after a long wait, I get the correct data:
Here are the results for the same test under other versions, sometimes I get all zeroes, sometimes I get the proper data:
I am looking for a script to extract the variable T3P1_IntegralProtonFlux from files and create a CSV or SQLITE file from them. The files are here:
Ultimately, I want to combine data from files from 1/1/2019 to today.
Example Code
Problem
I am not sure if this is an issue with netCDF4, the server, or my machine, but I am getting a value on the order of 40 million printing out, and the plot shows incomplete data:
Each time I run the example code, I get slightly different results. The last three runs indicated 42159004, 38724464, and 44907954 missing pixels, respectively.
I would expect either a value of 0 printing out (and no missing data), or some sort of error message indicating there was trouble retrieving the data.
I am using 1.4.0