bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 6 forks source link

DISCUSSION: proxy: add CORS headers to /twitcher/, affect all services behind it #450

Open tlvu opened 4 months ago

tlvu commented 4 months ago

Overview

This PR is rather to start a discuss than ready to merge. That's why there is no CHANGES.md update.

So I needed to add CORS allow headers for Thredds, so our partner javascript webapps running on other domains than pavics.ouranos.ca can hit our Thredds, so we act as the backend for their frontend.

  1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

  2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

birdhouse/config/magpie/config/proxy/conf.extra-service.d/magpie.conf.template
5:        include /etc/nginx/conf.d/cors.include;

birdhouse/deprecated-components/ncwms2/config/proxy/conf.extra-service.d/ncwms2.conf.template
5:    #    include /etc/nginx/conf.d/cors.include;

birdhouse/components/cowbird/config/proxy/conf.extra-service.d/cowbird.conf.template
8:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/weaver/config/proxy/conf.extra-service.d/weaver.conf.template
16:        include /etc/nginx/conf.d/cors.include;

birdhouse/components/stac/config/proxy/conf.extra-service.d/stac.conf.template
12:        include /etc/nginx/conf.d/cors.include;
  1. Is our current CORS headers way too permissive?

This is what we return https://github.com/bird-house/birdhouse-deploy/blob/97ee8da24821391aeef52b13ea9adda28f919085/birdhouse/components/proxy/conf.d/cors.include

Access-Control-Allow-Origin: *                                                                                                                                                           
Access-Control-Allow-Methods: GET, POST, OPTIONS                                                                                                                                         
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range
Access-Control-Expose-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range

This is already enough

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Authorization
Access-Control-Allow-Methods: POST, GET, OPTIONS

I think perhaps we should not allow-origin * but a list of known partners domain? And trim down the allow-headers list?

I am not security expert so I want to hear from you guys.

birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false

huard commented 4 months ago

Proposal to white list https://raven.uwaterloo.ca/

mishaschwartz commented 4 months ago

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

Some components (geoserver, jupyterhub, all the monitoring components, secure-data-auth) just use twitcher's verify endpoint to check whether a user has access, in those cases, the cors headers wouldn't be included.

Jupyterhub and the monitoring components probably won't matter for cross-site scripts but the others will matter I'm sure.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

You might have to add the X-Robots-Tag to the headers listed in Access-Control-Allow-Headers and Access-Control-Expose-Headers otherwise they won't be sent (but we'll have to test it to be sure)

Is our current CORS headers way too permissive?

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains. We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

tlvu commented 4 months ago

Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

Unfortunately I don't think it will even do that.

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header

Is this when making a cross-origin request or everywhere?

Everywhere. But I think I understood how this works. X-Robots-Tag: noindex, nofollow was added to the root location. But all child location directive do not inherit but override all added headers so to keep any headers added by the parent, the child location will have to repeat those headers.

So probably each time that cors.include file was included, we lost X-Robots-Tag header ! So we've been losing that header for magpie, weaver, cowbird, stac and we didn't even know !

I think perhaps we should not allow-origin * but a list of known partners domain?

This would have to be configurable as not every deployment would want the same domains. We should also think about whether we would need to set Access-Control-Allow-Credentials as well.

Agreed it has to be configurable.

I was reading how to make it configurable, it's not so simple. I didn't know "if is evil" in Nginx config and our current cors.include file uses if !

Some interesting read I found:

https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/

http://agentzh.blogspot.com/2011/03/how-nginx-location-if-works.html

https://www.juannicolas.eu/how-to-set-up-nginx-cors-multiple-origins/

And trim down the allow-headers list?

This one will require some thought. Do you know if there are recommended best-practices for setting these?

Have not had time but yes we should follow some newer best practices. That cors.include file was there even before I started at Ouranos.

fmigneault commented 4 months ago

Unfortunately I don't think it will even do that.

If we want to apply the cors headers everywhere, why not apply it at the proxy level instead of on twitcher. Not all requests will go through the twitcher proxy necessarily.

That is my current understanding of what is already applied by the following, which includes cors.include: https://github.com/bird-house/birdhouse-deploy/blob/97ee8da24821391aeef52b13ea9adda28f919085/birdhouse/components/proxy/nginx.conf.template#L36

Setting it under /twitcher/ "should" only redefine it the same way (it WON'T though, see next).

1. Is adding the CORS header to Twitcher okay with you guys? Because the new headers will affect all other services behind Twitcher.

2. By adding this CORS header, I lost the X-Robots-Tag: noindex, nofollow header (optional-components/x-robots-tag-header) ! Is that expected? Or the way I add headers to Twitcher is wrong? I was just doing the same thing as all the existing services. The X-Robots-Tag header is important to avoid being hit by crawlers.

if inside location definitions are big no-no: https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ It's not a thing about "overriding headers or not", it is just undefined behavior. if are perfectly valid at the server level (as they are currently applied).

Is our current CORS headers way too permissive?

Maybe? But good consideration must be given about which set of Origins are defined if set explicitly. If only the partner's Origin is set, any following partner or browser JavaScript using the server as a backend will suddenly be refused. Since the purpose of the platform is to provide access to services, I'm not sure adding strict origin control will change much other than make our maintenance harder. If configurable, I don't mind, as long as the default remains as the current value.

For the Access-Control-Allow-Headers, I think currents ones are good. Some of them are for caching or prechecks, which can help reduce request/response time/content-size if supported by a service. They shouldn't hurt. The only one I don't know is X-CustomHeader.

mishaschwartz commented 4 months ago

Sorry I meant adding CORS headers to the proxy location /twitcher/ for Twitcher. So the change is actually in Nginx and not in Twitcher but for Twitcher and all services behind it. See my initial code change.

I understand... what I was saying is that there are components that are not behind twitcher that will not be affected by this change.