Closed jasonfleming closed 3 years ago
@jasonfleming I believe you've identified the issue, there should be a way to set the content-type header for each file type through the thredds http server. I'll figure out how to set this properly.
Thank you @wwlwpd that is great to hear!
Looking more into this, I have found it covered in the docker container's config/web.xml
, but it might not be getting used properly by the TDS servelet. I don't much about tomcat, but I am leaning towards this being an issue with UNIDATA's TDS docker container,
https://stackoverflow.com/questions/42261607/tomcat-8-0-mime-mapping-content-type
Trying latest current version, 4.5.16
before filing an issue with UNIDATA via github. https://hub.docker.com/r/unidata/thredds-docker
Updating to the latest version via docker did not help. Created an issue upstream for UNIDATA, https://github.com/Unidata/thredds-docker/issues/251.
Related to this, it would be nice to have *.log
and *.properties
files designated as text/plain
to facilitate browser viewing across the board (all thredds servers), e.g.:
jason@kitt:/srv/work/asgs.dev/asgs$ curl -v -O https://fortytwo.cct.lsu.edu/thredds/fileServer/2020/sally/16/NGOMv19b/qbc.loni.org/NGOMv19b_al192020_jgf/nhcConsensus/run.properties
...
HTTP/1.1 200 OK
< Server: nginx/1.18.0
< Date: Sat, 14 Aug 2021 17:45:01 GMT
< Content-Type: application/octet-stream
< Content-Length: 13897
< Last-Modified: Sat, 12 Dec 2020 06:34:15 GMT
< Connection: keep-alive
< ETag: "5fd46467-3649"
< Strict-Transport-Security: max-age=31536000
< Accept-Ranges: bytes
and
jason@kitt:/srv/work/asgs.dev/asgs$ curl -O -v https://fortytwo.cct.lsu.edu/thredds/fileServer/2020/sally/16/NGOMv19b/qbc.loni.org/NGOMv19b_al192020_jgf/nhcConsensus/scenario.log
...
< HTTP/1.1 200 OK
< Server: nginx/1.18.0
< Date: Sat, 14 Aug 2021 17:47:12 GMT
< Content-Type: application/octet-stream
< Content-Length: 137275
< Last-Modified: Sat, 12 Dec 2020 06:34:15 GMT
< Connection: keep-alive
< ETag: "5fd46467-2183b"
< Strict-Transport-Security: max-age=31536000
< Accept-Ranges: bytes
@wwlwpd Could this mime type fix be accomplished via nginx and/or apache reconfiguration instead of tomcat?
tomcat is the webserver, so the fix lies in properly configuring it there; to explain more, using something outside of the actual webserver (even if it is another webserver) will change the way this is accomplished since it's now acting as a "reverse proxy" - then would require all kinds of logic to detect request then modify the response.
@jasonfleming I put several hours into troubleshooting this last night. It appears (I think) that the mime type is hard coded; it certainly doesn't use or care about the web.xml
file, which is where tomcat maps MIME types to file extensions.
I compared the thredds
configuration on oden with that on adcirc viz, since it is use the same configuration file. However, adcirc viz is using version 3.7 and oden is using version 4.6 - the former doesn't have a docker container associated with it. I found no differences that seem to indicate a solution. For the time being, I am going to mark this as "needs help". I'll monitory that GH issue I created in the thredds repo. Eventually, this might become a "won't fix" - but I agree, it is annoying.
A summary of resources I dug up,
Oh, @jasonfleming feel free to take a crack at it. DM and I'll give you info on how to deal with the docker container on oden.
Hey @wwlwpd I had a look another look at this about a week and a half ago, and I agree that the tomcat web.xml
looks perfect, and is ignored by the thredds
application inside the container. The docs say the app can use what is in the web.xml
and add additional MIME types in its own web.xml
and I didn't find anything contradictory in the thredds
web.xml
. As you said, the issue has to be in the thredds
application code itself, which I agree is a showstopper for us to address it, unfortunately. Hopefully Unidata will heed the issue you logged.
Another approach that I have been experimenting with is to copy the files to a path where they can be served via nginx
using autoindex on
in the nginx
config. Seems like it would be the minimal solution and we would have more direct control over it. What do you think?
Thank you for validating my observations. I know @akheir has had success in using nginx
as a proxy for the the static webserver - maybe we can get him to give us a brief description of we he did? My big concern is that "HTTPServer" link that thredds provides; we either need to handle that via reverse proxy (like what I think @akheir did) or not enable this in thredds altogether but rely on some implicit knowledge of where to look for non-data files.
We already have support for posting to multiple remote filesystems in opendap_post.sh
... maybe for the status JSON files we could post them to a path that is served by the local thredds
server, as well as another path that is served by a local nginx
server. The data files would still just go to the path served by the thredds
server.
The other nice thing about the JSONView browser plugin is that it automatically acts as a "jsonlint" syntax checker.
From what I read here, the root cause is in the web server configuration on this particular server. The line
application/json json
must be explicitly set or enabled in the mime types configuration file. For Apache this is the file "mime.types" that lives under "Apache/conf".
Also, it might help to look for the entry application/octet-stream .... and see if "json" is listed there. If so, remove :-)
Mime types work with file extensions, so in theory it would be possible to set text/plain prop log As long as it does not conflict with something else ;-)
a developer replied to the GH issue I created, so I provided the information they requested and will continue to monitor it
I am going to go with the suggestion to set up nginx
as a reverse proxy since this is really a pain in the butt, here's the config that @akheir has used for his set up. Our situation is a little different, but this is a good start.
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://localhost:8080/;
proxy_redirect default;
}
location /thredds/fileServer {
alias /data/opendap;
autoindex on;
autoindex_exact_size off;
autoindex_localtime on;
}
location = / {
return 301 https://fortytwo.cct.lsu.edu/thredds/catalog.html;
}
Hey @wwlwpd my suggestion is to use nginx as a standalone soln rather than as a reverse proxy for thredds, but I am not an expert so take that with a grain of salt. :-) The idea was that we can post status messages to a path on stormsurge.live in the cloud and serve that path with nginx that is configured to autoindex and with correct mime type mappings. That would accomplish the aggregation of status data as a bonus side effect. But that approach might be even more of an admin burden.
I upgrade to TDS 5.0 and the same problem exists, I think I found where the mimetypes are hard coded - https://github.com/Unidata/tds/blob/f0a6297f6fd8b9e4cb3ee2b29312ce64eda94458/tdcommon/src/main/java/thredds/util/ContentType.java
I created a new issue on the TDS itself, rather than on the docker image repo: https://github.com/Unidata/tds/issues/161
got a confirmation on the content-type
issue, via Unidata/tds#161 - will track it, hopefully we won't have to use nginx
Yes let's see what they can do with it before we implement a reverse proxy as a workaround.
@jasonfleming - there's been upstream motion on this content-type issue!!
need to follow up once the upstream fix to tds gets merged
This is awesome and very welcome news! Looking forward to their new containerized THREDDS!!
The status monitoring subsystem posts JSON files with status data to the thredds servers for opendap service. One of our first use cases is to examine these files in-situ with a web browser. This works as expected on the
adcircvis.tacc.texas.edu
thredds server and thefortytwo.cct.lsu.edu
thredds server.However, when browsing these same files on the
chg-1.oden.tacc.utexas.edu
thredds server, the web browser (chromium in this case) prompts the Operator to save the file instead of showing the content directly.Some preliminary research indicates that this may be because the
adcircvis
andfortytwo
thredds servers label JSON files asapplication/json
, e.g.:whereas the
chg-1.oden
thredds server labels JSON files asapplication/octet-stream
:Is there something in the different https headers that is causing this difference in web browser behavior? If so, is there anything that can be done about it?