GSA / datagov-ckan-multi

Other
10 stars 6 forks source link

https://catalog.data.gov is redirecting to https://catalog-next.data.gov/dataset #561

Open adborden opened 3 years ago

adborden commented 3 years ago

After switching the origin servers to catalog-next, everything seemed working with the one exception:

Browsing to https://catalog.data.gov results in a redirect to https://catalog-next.data.gov/dataset

How to reproduce

  1. Description of steps to reproduce the issue.

Expected behavior

302 redirect to https://catalog.data.gov/dataset

Actual behavior

302 redirect to https://catalog-next.data.gov/dataset

adborden commented 3 years ago

This is complicated because there's essentially three reverse proxies. CloudFront, FCS Netscaler, and Apache.

With our current configuration, we're expecting these headers to be passed through to all three, Host: catalog-next.data.gov and X-Forwarded-Host: catalog.data.gov and for CKAN to respect X-Forwarded-Host (or just use ckan.site_url).

I'm seeing three areas for a potential fix:

  1. Fix CKAN to use ckan.site_url in all redirects or respect X-Forwarded-Host
  2. Tweak the Apache config to force a specific Host header to pass to CKAN.
  3. Update FCS to change the catalog.data.gov route to point to the catalog-next hosts.

Right now I'm leaning toward the FCS change.

adborden commented 3 years ago

Discussed this with @avdata99 and @hkdctol . Andres will spend ~1 hour investigating (1) above to see if there is a fix or change that should happen in CKAN. The redirect is handled by pylons so this behavior may change anyway in CKAN 2.9.

I will open a ticket with FCS for (3) and hopefully we can schedule something for Friday or Monday.

adborden commented 3 years ago

I opened RITM0810420 for the Netscaler change.

avdata99 commented 3 years ago

CKAN overrides a function in the mapper but lets routes Mapper (old v1.13) to handle the redirections https://github.com/ckan/ckan/blob/2.8/ckan/config/routing.py#L49-L54

Pylons in CKAN is using that routes https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/pylons_app.py#L63

This Mapper uses an URLGenerator that allows defining an environment https://github.com/bbangert/routes/blob/v1.13/routes/util.py#L273

There are some tests that can be useful to see how this environ works https://github.com/bbangert/routes/blob/v1.13/tests/test_functional/test_explicit_use.py#L27-L28

The flask_app is using ckan site_url https://github.com/ckan/ckan/blob/2.8/ckan/config/middleware/__init__.py#L195-L198

Is still not clear for me if we can override the headers at some point in CKAN

adborden commented 3 years ago

Thanks @avdata99, that tells me that CKAN should respect the X-Forwarded-Host header, but something is still not working. Maybe gunicorn is not passing this header through.

In local development, with debugging enabled, can you dump out the enviornment? The test would be:

$ curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000

You should see Location: http://catalog.data.gov/dataset in the response.

adborden commented 3 years ago

... and you should see the HTTP_X_FORWARDED_HOST=catalog.data.gov in the environment.

avdata99 commented 3 years ago

@adborden that what I see locally

curl -v -H 'X-Forwarded-Host: catalog.data.gov' http://localhost:5000
* Rebuilt URL to: http://localhost:5000/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 5000 (#0)
> GET / HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.58.0
> Accept: */*
> X-Forwarded-Host: catalog.data.gov
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
< Server: PasteWSGIServer/0.5 Python/2.7.18
< Date: Fri, 05 Feb 2021 13:37:27 GMT
< Content-Type: text/plain; charset=utf8
< Location: http://localhost:5000/dataset
< Connection: close
< 
* Closing connection 0
adborden commented 3 years ago

Okay, and in the CKAN logs, can you dump out the environ and see if HTTP_X_FORWARDED_HOST is included?

Your server is Server: PasteWSGIServer/0.5 Python/2.7.18, so we've ruled out gunicorn getting in the way.

So, from the pylons/routes code, it looks like this should be supported but something is not working. We should be seeing Location: http://catalog.data.gov/dataset. I would ask the CKAN folks or open an issue, but I don't think we need to pursue this further.

avdata99 commented 3 years ago

@adborden if I dump the environ here I see this:

ckan_1   | 2021-02-05 15:23:54,217 INFO  [ckan.config.middleware] Serving request via pylons_app app
ckan_1   | 2021-02-05 15:23:54,218 INFO  [ckan.config.middleware] Environ 
{
'SCRIPT_NAME': '', 
'REQUEST_METHOD': 'GET', 
'ckan.app': 'pylons_app', 
'PATH_INFO': '/', 
'SERVER_PROTOCOL': 'HTTP/1.1', 
'QUERY_STRING': '', 
'CONTENT_LENGTH': '0', 
'HTTP_USER_AGENT': 'curl/7.58.0', 
'SERVER_NAME': '0.0.0.0', 
'REMOTE_ADDR': '172.28.0.1', 
'wsgi.url_scheme': 'http', 
'SERVER_PORT': '5000', 
'CKAN_CURRENT_URL': '/', 
'CKAN_LANG': 'en', 
'wsgi.input': <socket._fileobject object at 0x7f65e12978d0 length=0>, 
'HTTP_HOST': 'localhost:5000', 
'wsgi.multithread': True, 
'HTTP_ACCEPT': '*/*', 
'CKAN_LANG_IS_DEFAULT': True, 
'wsgi.version': (1, 0), 
'wsgi.run_once': False, 
'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f65e68b7270>, 
'wsgi.multiprocess': False, 
'HTTP_X_FORWARDED_HOST': 'catalog.data.gov', 
'CONTENT_TYPE': '', 
'paste.httpserver.thread_pool': <paste.httpserver.ThreadPool object at 0x7f65e5b9d290>
}
adborden commented 3 years ago

👍 looks like a bug to me.