Open adborden opened 3 years ago
This is complicated because there's essentially three reverse proxies. CloudFront, FCS Netscaler, and Apache.
With our current configuration, we're expecting these headers to be passed through to all three, Host: catalog-next.data.gov
and X-Forwarded-Host: catalog.data.gov
and for CKAN to respect X-Forwarded-Host (or just use ckan.site_url
).
I'm seeing three areas for a potential fix:
ckan.site_url
in all redirects or respect X-Forwarded-HostRight now I'm leaning toward the FCS change.
Discussed this with @avdata99 and @hkdctol . Andres will spend ~1 hour investigating (1) above to see if there is a fix or change that should happen in CKAN. The redirect is handled by pylons so this behavior may change anyway in CKAN 2.9.
I will open a ticket with FCS for (3) and hopefully we can schedule something for Friday or Monday.
I opened RITM0810420 for the Netscaler change.
CKAN overrides a function in the mapper but lets routes Mapper (old v1.13) to handle the redirections https://github.com/ckan/ckan/blob/2.8/ckan/config/routing.py#L49-L54
Pylons in CKAN is using that routes https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/pylons_app.py#L63
This Mapper uses an URLGenerator that allows defining an environment
https://github.com/bbangert/routes/blob/v1.13/routes/util.py#L273
There are some tests that can be useful to see how this environ
works
https://github.com/bbangert/routes/blob/v1.13/tests/test_functional/test_explicit_use.py#L27-L28
The flask_app is using ckan site_url https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/__init__.py#L195-L198
Is still not clear for me if we can override the headers at some point in CKAN
Thanks @avdata99, that tells me that CKAN should respect the X-Forwarded-Host header, but something is still not working. Maybe gunicorn is not passing this header through.
In local development, with debugging enabled, can you dump out the enviornment? The test would be:
$ curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000
You should see Location: http://catalog.data.gov/dataset
in the response.
... and you should see the HTTP_X_FORWARDED_HOST=catalog.data.gov
in the environment.
@adborden that what I see locally
curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000
* Rebuilt URL to: http://localhost:5000/
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET / HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.58.0
> Accept: */*
> X-Forwarded-Host: catalog.data.gov
>
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
< Server: PasteWSGIServer/0.5 Python/2.7.18
< Date: Fri, 05 Feb 2021 13:37:27 GMT
< Content-Type: text/plain; charset=utf8
< Location: http://localhost:5000/dataset
< Connection: close
<
* Closing connection 0
Okay, and in the CKAN logs, can you dump out the environ and see if HTTP_X_FORWARDED_HOST is included?
Your server is Server: PasteWSGIServer/0.5 Python/2.7.18
, so we've ruled out gunicorn getting in the way.
So, from the pylons/routes code, it looks like this should be supported but something is not working. We should be seeing Location: http://catalog.data.gov/dataset
. I would ask the CKAN folks or open an issue, but I don't think we need to pursue this further.
@adborden if I dump the environ
here I see this:
ckan_1 | 2021-02-05 15:23:54,217 INFO [ckan.config.middleware] Serving request via pylons_app app
ckan_1 | 2021-02-05 15:23:54,218 INFO [ckan.config.middleware] Environ
{
'SCRIPT_NAME': '',
'REQUEST_METHOD': 'GET',
'ckan.app': 'pylons_app',
'PATH_INFO': '/',
'SERVER_PROTOCOL': 'HTTP/1.1',
'QUERY_STRING': '',
'CONTENT_LENGTH': '0',
'HTTP_USER_AGENT': 'curl/7.58.0',
'SERVER_NAME': '0.0.0.0',
'REMOTE_ADDR': '172.28.0.1',
'wsgi.url_scheme': 'http',
'SERVER_PORT': '5000',
'CKAN_CURRENT_URL': '/',
'CKAN_LANG': 'en',
'wsgi.input': <socket._fileobject object at 0x7f65e12978d0 length=0>,
'HTTP_HOST': 'localhost:5000',
'wsgi.multithread': True,
'HTTP_ACCEPT': '*/*',
'CKAN_LANG_IS_DEFAULT': True,
'wsgi.version': (1, 0),
'wsgi.run_once': False,
'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f65e68b7270>,
'wsgi.multiprocess': False,
'HTTP_X_FORWARDED_HOST': 'catalog.data.gov',
'CONTENT_TYPE': '',
'paste.httpserver.thread_pool': <paste.httpserver.ThreadPool object at 0x7f65e5b9d290>
}
👍 looks like a bug to me.
After switching the origin servers to catalog-next, everything seemed working with the one exception:
Browsing to https://catalog.data.gov results in a redirect to https://catalog-next.data.gov/dataset
How to reproduce
Expected behavior
302 redirect to https://catalog.data.gov/dataset
Actual behavior
302 redirect to https://catalog-next.data.gov/dataset