StormSurgeLive / asgs

The Automated Solution Generation System (ASGS) provides software infrastructure for automating coastal ocean modelling for real time decision support, and provides a variety of standalone command line tools for pre- and post-processing. Visit us at https://discord.gg/jFbacxrUf9
https://tools.adcirc.live
GNU General Public License v3.0
39 stars 22 forks source link

web browser tries to download json status files instead of rendering them for direct viewing #618

Closed jasonfleming closed 3 years ago

jasonfleming commented 3 years ago

The status monitoring subsystem posts JSON files with status data to the thredds servers for opendap service. One of our first use cases is to examine these files in-situ with a web browser. This works as expected on the adcircvis.tacc.texas.edu thredds server and the fortytwo.cct.lsu.edu thredds server.

However, when browsing these same files on the chg-1.oden.tacc.utexas.edu thredds server, the web browser (chromium in this case) prompts the Operator to save the file instead of showing the content directly.

Some preliminary research indicates that this may be because the adcircvis and fortytwo thredds servers label JSON files as application/json, e.g.:

jason@kitt:/srv/work/asgs.dev/asgs$ curl -v http://adcircvis.tacc.utexas.edu:8080/thredds/fileServer/asgs/2021/status/frontera.tacc.utexas.edu/SABv20a_nam_jgf/hook.status.json
*   Trying 129.114.97.49:8080...
* TCP_NODELAY set
* Connected to adcircvis.tacc.utexas.edu (129.114.97.49) port 8080 (#0)
> GET /thredds/fileServer/asgs/2021/status/frontera.tacc.utexas.edu/SABv20a_nam_jgf/hook.status.json HTTP/1.1
> Host: adcircvis.tacc.utexas.edu:8080
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Last-Modified: Fri, 13 Aug 2021 20:59:50 GMT
< Accept-Ranges: bytes
< Content-Type: application/json
< Content-Length: 4039
< Date: Fri, 13 Aug 2021 21:11:01 GMT
< 

whereas the chg-1.oden thredds server labels JSON files as application/octet-stream:

jason@kitt:/srv/work/asgs.dev/asgs$ curl -v http://chg-1.oden.tacc.utexas.edu/thredds/fileServer/asgs/2021/nam/2021081318/SABv20a/frontera.tacc.utexas.edu/SABv20a_nam_jgf/nowcast/scenario.status.json
*   Trying 129.114.97.179:80...
* TCP_NODELAY set
* Connected to chg-1.oden.tacc.utexas.edu (129.114.97.179) port 80 (#0)
> GET /thredds/fileServer/asgs/2021/nam/2021081318/SABv20a/frontera.tacc.utexas.edu/SABv20a_nam_jgf/nowcast/scenario.status.json HTTP/1.1
> Host: chg-1.oden.tacc.utexas.edu
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 
< Strict-Transport-Security: max-age=0
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< vary: Origin
< Content-Disposition: attachment; filename="scenario.status.json"
< Accept-Ranges: bytes
< Content-Type: application/octet-stream
< Content-Length: 5190
< Date: Fri, 13 Aug 2021 21:09:48 GMT
< Server: Apache
< 

Is there something in the different https headers that is causing this difference in web browser behavior? If so, is there anything that can be done about it?

wwlwpd commented 3 years ago

@jasonfleming I believe you've identified the issue, there should be a way to set the content-type header for each file type through the thredds http server. I'll figure out how to set this properly.

jasonfleming commented 3 years ago

Thank you @wwlwpd that is great to hear!

wwlwpd commented 3 years ago

Looking more into this, I have found it covered in the docker container's config/web.xml, but it might not be getting used properly by the TDS servelet. I don't much about tomcat, but I am leaning towards this being an issue with UNIDATA's TDS docker container,

https://stackoverflow.com/questions/42261607/tomcat-8-0-mime-mapping-content-type

wwlwpd commented 3 years ago

Trying latest current version, 4.5.16 before filing an issue with UNIDATA via github. https://hub.docker.com/r/unidata/thredds-docker

wwlwpd commented 3 years ago

Updating to the latest version via docker did not help. Created an issue upstream for UNIDATA, https://github.com/Unidata/thredds-docker/issues/251.

jasonfleming commented 3 years ago

Related to this, it would be nice to have *.log and *.properties files designated as text/plain to facilitate browser viewing across the board (all thredds servers), e.g.:

jason@kitt:/srv/work/asgs.dev/asgs$ curl -v -O https://fortytwo.cct.lsu.edu/thredds/fileServer/2020/sally/16/NGOMv19b/qbc.loni.org/NGOMv19b_al192020_jgf/nhcConsensus/run.properties
...
 HTTP/1.1 200 OK
< Server: nginx/1.18.0
< Date: Sat, 14 Aug 2021 17:45:01 GMT
< Content-Type: application/octet-stream
< Content-Length: 13897
< Last-Modified: Sat, 12 Dec 2020 06:34:15 GMT
< Connection: keep-alive
< ETag: "5fd46467-3649"
< Strict-Transport-Security: max-age=31536000
< Accept-Ranges: bytes

and

jason@kitt:/srv/work/asgs.dev/asgs$ curl -O -v https://fortytwo.cct.lsu.edu/thredds/fileServer/2020/sally/16/NGOMv19b/qbc.loni.org/NGOMv19b_al192020_jgf/nhcConsensus/scenario.log
...
< HTTP/1.1 200 OK
< Server: nginx/1.18.0
< Date: Sat, 14 Aug 2021 17:47:12 GMT
< Content-Type: application/octet-stream
< Content-Length: 137275
< Last-Modified: Sat, 12 Dec 2020 06:34:15 GMT
< Connection: keep-alive
< ETag: "5fd46467-2183b"
< Strict-Transport-Security: max-age=31536000
< Accept-Ranges: bytes
jasonfleming commented 3 years ago

@wwlwpd Could this mime type fix be accomplished via nginx and/or apache reconfiguration instead of tomcat?

wwlwpd commented 3 years ago

tomcat is the webserver, so the fix lies in properly configuring it there; to explain more, using something outside of the actual webserver (even if it is another webserver) will change the way this is accomplished since it's now acting as a "reverse proxy" - then would require all kinds of logic to detect request then modify the response.

wwlwpd commented 3 years ago

@jasonfleming I put several hours into troubleshooting this last night. It appears (I think) that the mime type is hard coded; it certainly doesn't use or care about the web.xml file, which is where tomcat maps MIME types to file extensions.

I compared the thredds configuration on oden with that on adcirc viz, since it is use the same configuration file. However, adcirc viz is using version 3.7 and oden is using version 4.6 - the former doesn't have a docker container associated with it. I found no differences that seem to indicate a solution. For the time being, I am going to mark this as "needs help". I'll monitory that GH issue I created in the thredds repo. Eventually, this might become a "won't fix" - but I agree, it is annoying.

A summary of resources I dug up,

wwlwpd commented 3 years ago

Oh, @jasonfleming feel free to take a crack at it. DM and I'll give you info on how to deal with the docker container on oden.

jasonfleming commented 3 years ago

Hey @wwlwpd I had a look another look at this about a week and a half ago, and I agree that the tomcat web.xml looks perfect, and is ignored by the thredds application inside the container. The docs say the app can use what is in the web.xml and add additional MIME types in its own web.xml and I didn't find anything contradictory in the thredds web.xml. As you said, the issue has to be in the thredds application code itself, which I agree is a showstopper for us to address it, unfortunately. Hopefully Unidata will heed the issue you logged.

Another approach that I have been experimenting with is to copy the files to a path where they can be served via nginx using autoindex on in the nginx config. Seems like it would be the minimal solution and we would have more direct control over it. What do you think?

wwlwpd commented 3 years ago

Thank you for validating my observations. I know @akheir has had success in using nginx as a proxy for the the static webserver - maybe we can get him to give us a brief description of we he did? My big concern is that "HTTPServer" link that thredds provides; we either need to handle that via reverse proxy (like what I think @akheir did) or not enable this in thredds altogether but rely on some implicit knowledge of where to look for non-data files.

jasonfleming commented 3 years ago

We already have support for posting to multiple remote filesystems in opendap_post.sh ... maybe for the status JSON files we could post them to a path that is served by the local thredds server, as well as another path that is served by a local nginx server. The data files would still just go to the path served by the thredds server.

jasonfleming commented 3 years ago

The other nice thing about the JSONView browser plugin is that it automatically acts as a "jsonlint" syntax checker.

carolakaiser commented 3 years ago

From what I read here, the root cause is in the web server configuration on this particular server. The line

application/json json

must be explicitly set or enabled in the mime types configuration file. For Apache this is the file "mime.types" that lives under "Apache/conf".

carolakaiser commented 3 years ago

Also, it might help to look for the entry application/octet-stream .... and see if "json" is listed there. If so, remove :-)

carolakaiser commented 3 years ago

Mime types work with file extensions, so in theory it would be possible to set text/plain prop log As long as it does not conflict with something else ;-)

wwlwpd commented 3 years ago

a developer replied to the GH issue I created, so I provided the information they requested and will continue to monitor it

wwlwpd commented 3 years ago

I am going to go with the suggestion to set up nginx as a reverse proxy since this is really a pain in the butt, here's the config that @akheir has used for his set up. Our situation is a little different, but this is a good start.

 location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_pass http://localhost:8080/;
    proxy_redirect default;
  }

  location /thredds/fileServer {
    alias /data/opendap;
    autoindex on;
    autoindex_exact_size off;
    autoindex_localtime on;
  }

  location = / {
    return 301 https://fortytwo.cct.lsu.edu/thredds/catalog.html;
  }
jasonfleming commented 3 years ago

Hey @wwlwpd my suggestion is to use nginx as a standalone soln rather than as a reverse proxy for thredds, but I am not an expert so take that with a grain of salt. :-) The idea was that we can post status messages to a path on stormsurge.live in the cloud and serve that path with nginx that is configured to autoindex and with correct mime type mappings. That would accomplish the aggregation of status data as a bonus side effect. But that approach might be even more of an admin burden.

wwlwpd commented 3 years ago

I upgrade to TDS 5.0 and the same problem exists, I think I found where the mimetypes are hard coded - https://github.com/Unidata/tds/blob/f0a6297f6fd8b9e4cb3ee2b29312ce64eda94458/tdcommon/src/main/java/thredds/util/ContentType.java

I created a new issue on the TDS itself, rather than on the docker image repo: https://github.com/Unidata/tds/issues/161

wwlwpd commented 3 years ago

got a confirmation on the content-type issue, via Unidata/tds#161 - will track it, hopefully we won't have to use nginx

jasonfleming commented 3 years ago

Yes let's see what they can do with it before we implement a reverse proxy as a workaround.

wwlwpd commented 3 years ago

@jasonfleming - there's been upstream motion on this content-type issue!!

wwlwpd commented 3 years ago

need to follow up once the upstream fix to tds gets merged

jasonfleming commented 3 years ago

This is awesome and very welcome news! Looking forward to their new containerized THREDDS!!