Closed dopplershift closed 5 years ago
Roping in @Unidata/thredds too. I feel like this is server-based, but the fact that netCDF-java talks to the problematic server would seem to belie that.
Interesting; those are certainly conflicting symptoms. I’ll check if I can duplicate as soon as I can (working on phone at the moment) and will trace through to the precise error. I’ll follow up with the root of the error as soon as I discover it.
Duplicated with self-compiled on OSX. Interesting, I will compile with logging and pass some additional DAP parameters that Dennis added (after I look them up) in the morning and see what falls out. Thanks, @dopplershift, This is an odd one. I believe that the root might be that the C library is being more stringent in some respects, 4.6.2 added some additional checks for various behaviors. They can largely be overridden, however. I don't know that's the case, however, so I will suss it out in the morning. Thanks!
I'm encountering the issue reading the .dods
file. Still trying to get to the bottom of it but thusfar I can say that in the failing case, the call to readpacket()
at ocread.c:135 returns state->packet.length == 0
and the working case returns state->packet.length == 1992
. I'll continue to dig, but wanted to make a note of this. Also, seeing if I can tag in @DennisHeimbigner this way.
I'll note that the only difference between the .dods
return is:
nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06
versus
nexrad/level2/KDIX/20190702/Level2_KDIX_20190702_1948.ar2v
in the header. The one that works has a filename with an extension. That couldn't possibly be it?
Update: Using wget to check, I see the following:
I can use wget
to fetch the entire .dods
file from either server. I can also append ?Reflectivity
to the query on either server and download just the Reflectivity variable. However, when I try to subset the variable, we see another failure.
$ wget http://thredds.ucar.edu/thredds/dodsC/nexrad/level2/KDIX/20190702/Level2_KDIX_20190702_1948.ar2v.dods?Reflectivity%5fHI[0][0][0:1831]
$ wget http://thredds-aws.unidata.ucar.edu/thredds/dodsC/nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06.dods?Reflectivity%5fHI[0][0][0:1831]
Issue discovered, the square brackets appear to need encoded to work; by replacing ]
[
with %5D
and %5B
respectively, the subsetting query worked. Issue diagnosed, leaving the issue open while @dopplershift figures out how to address this. It appears to be out of our (C library) teams hands at this point.
So the root issue is that more and more web servers are following the standards for URL composition. For URIs, there are certain rules when characters should be encoded, but they sort of leave it up to the type of URI to specificy what those characters are. For URLs, the square bracket characters fall into the "unsafe" category and should always be encoded. Here is the relevant text from RFC 1738:
Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".
All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URI and URL encoding.
In the past few years, the Tomcat servlet container started rejecting unencoded unsafe characters, and thus we get what we have here. As a side note, apache webservers appear to happily accept some of these characters, so requests with unencoded square brackets to thredds.ucar.edu will work just fine, as we run Tomcat behind Apache.
We had a nice discussion over in netCDF-Java land about this, because at one point we encoded URLs, then we stopped encoding some parts and so things broke, but now we are fully encoding again. If netCDF-C is constructing URLs, then I would say it is on the library to make sure things are encoded properly. Curl might have an option to encode URLs for you, so it might be a simple fix, but I'm not sure (maybe curl_easy_escape(...)?)
It turns out there are some web servers that simply choke on encoded URLs, but they are by far in the minority. The approach we took in netCDF-Java was to encode URLs by default, but allow users to set a flag to turn off the encoding behaviour if they needed that.
Encoding in the C library shouldn't be an issue, curl_easy_escape
was the first thing that occurred to me. Let me poke around on this and see if this fix is as easy as one would expect.
It is not a necessarily easy fix by encoding here; given that it also fails using wget, it seems like a two pronged approach may be necessary here. I'll keep digging at it on my end, but sadly curl_easy_escape()
was not a straight-forward panacea.
Well, wget fails because it only takes the url we give it--it doesn't do any encoding, that's up to the user I believe. Here's the proper example in curl:
>curl -G -v "http://thredds-aws.unidata.ucar.edu/thredds/dodsC/nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06.dods?Reflectivity_HI[0][0][0:1831]"
* Trying 149.165.170.40...
* TCP_NODELAY set
* Connected to thredds-aws.unidata.ucar.edu (149.165.170.40) port 80 (#0)
> GET /thredds/dodsC/nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06.dods?Reflectivity_HI[0][0][0:1831] HTTP/1.1
> Host: thredds-aws.unidata.ucar.edu
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 400
< Transfer-Encoding: chunked
< Date: Wed, 03 Jul 2019 20:46:07 GMT
< Connection: close
< Server: Apache
<
* Closing connection 0
Fails. This works:
>curl -G -v "http://thredds-aws.unidata.ucar.edu/thredds/dodsC/nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06.dods" --data-urlencode "Reflectivity_HI[0][0][0:1831]"
* Trying 149.165.170.40...
* TCP_NODELAY set
* Connected to thredds-aws.unidata.ucar.edu (149.165.170.40) port 80 (#0)
> GET /thredds/dodsC/nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06.dods?Reflectivity_HI%5B0%5D%5B0%5D%5B0%3A1831%5D HTTP/1.1
> Host: thredds-aws.unidata.ucar.edu
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200
< Access-Control-Allow-Origin: *
< XDODS-Server: opendap/3.7
< Content-Description: dods-data
< Content-Type: application/octet-stream
< Transfer-Encoding: chunked
< Date: Wed, 03 Jul 2019 20:45:59 GMT
< Server: Apache
<
Dataset {
Byte Reflectivity_HI[scanR_HI = 1][radialR_HI = 1][gateR_HI = 1832];
} nexrad/level2/S3/2019/07/02/KDIX/KDIX20190702_194819_V06;
Data:
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 1840)
* Failed writing data
* Closing connection 0
Turns out there is a flaw in the query parser for URLS in ncuri.c I should be able to fix in a couple of hours.
Below is the text for the commit (same as pull request text) for this fix. Attached is a zip file containing the modified files. Needs testing.
++++++++++++++++++++++++++++++ Fix encoding of a DAP2 constraint specified outside the URL.
re: github issue https://github.com/Unidata/netcdf-c/issues/1425
The 'ncdump -v' command causes a constraint to be sent to the opendap code (in libdap2). This is a separate path from specifying the constraint via a URL.
This separate path encoded its constraint using code independent of and duplicative of that provided by ncuri.c and this duplicate code did not properly encode the constraint, which might include square brackets.
Solution chosen here was to get rid of the duplicate code and ensure that all URL escaping is performed in the ncuribuild function in the ncuri.c file.
Also removed the use of the NEWESCAPE conditional in ncuri.c because it is no longer needed.
I’m confused, why isn’t this being done as a regular pull request?
Perhaps we can resume this discussion when I'm back on Monday. Related: https://github.com/Unidata/thredds-docker/issues/209
Also potential relevant snippet to add to server.xml
via a Tomcat solution:
<Connector server="Apache" port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" relaxedQueryChars="[]:"/>
We already do this on
http://thredds-jetstream.unidata.ucar.edu/thredds/catalog.html
Not sure why this was not transferred over to
@julienchastang That would fix our server, yes, but doesn’t help netcdf-c with servers we don’t control, servers that are not misconfigured.
I think we need to bite the bullet and not use relaxedQueryChars.
Converted to pull request: https://github.com/Unidata/netcdf-c/pull/1439
I've found a real head scratcher. I've run into problems accessing some radar data on THREDDS instance running in jetstream, configured to serve NEXRAD data from the AWS S3 archive. It manifests as an error in Python as
RuntimeError: NetCDF: Access failure
, but I can reproduce withncdump
:produces
This fails for me with ncdump with both 4.6.2 and 4.7.0. Here's where it gets weird--the above command works if we switch to Unidata's "local" TDS:
yields pages and pages of output. Both files have identical SHA256 hashes, so the problem is not server-side file corruption.
One other data point: netCDF-java (through ToolsUI), has no problem accessing the thredds-aws OPeNDAP url and dumping the variable. Also, I have no problem accessing the data on thredds-aws using the CDMRemote protocol, so it would not appear to be a server issue. Both thredds-aws and thredds.ucar.edu are running the same version of the server.
For completeness sake, I'm seeing this on macOS 10.14.5.