Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Karaf seems to send derivative requests to the wrong place. #1070

Open alistairjmcintyre opened 5 years ago

alistairjmcintyre commented 5 years ago

We currently have Islandora 8 running behind an nginx reverse proxy server that takes care of TLS termination.

We enforce HTTPS everywhere which results in 301 or 302 redirects to the https entry, but I'm getting the error below when generating a derivative:

2019-03-28 11:04:52,340 | DEBUG | nnector-houdini] | DefaultErrorHandler | 86 - org.apache.camel.camel-core - 2.19.2 | Failed delivery for (MessageId: ID-staging-idora1-41349-1553724259858-1-11 on ExchangeId: ID--staging-idora1-41349-1553724259858-1-1). On delivery attempt: 2 caught: org.apache.camel.http.common.HttpOperationFailedException: HTTP operation failed invoking http://staging.example.nz/node/5/media/image/18 with statusCode: 302, redirectLocation: https://staging.example.nz/node/5/media/image/18

I'm wondering a couple of things regarding this:

  1. It seems odd that karaf would blindly assume I want to go back to the front-end proxy, is there something I'm missing here?
  2. Secondly, is there a way to make karaf/camel okay with 302 redirects?

I'm not particularly adept with Karaf and/or Camel, so it's plausible there's something I'm missing here.

I do have a work-around for this at the moment, but it involves some truly truly evil nginx config and an /etc/hosts entry and that seems like I'm doing it the wrong way.

whikloj commented 5 years ago

That is odd because the response URL (ie http://staging.example.nz/node/5/media/image/18) is provided by your Drupal instance here.

So does your Drupal instance respond to non-SSL requests?

alistairjmcintyre commented 5 years ago

Thanks for the quick (and extremely useful) response!

We made some changes yesterday with regard to Drupal 8 and Reverse Proxying, namely $settings['reverse_proxy'] = TRUE; and $settings['reverse_proxy_addresses'] which seems to have made karaf not run into the 302 redirect error, so that's a bonus.

Is there any way to override this setting? It seems redundant to go Karaf -> Proxy -> Drupal when it could go Karaf -> Drupal.

whikloj commented 5 years ago

Except that I think we are doing that (Karaf -> Drupal), but Drupal is telling us to go to the Proxy.

Which indicates that Drupal is setting the post back address as non-SSL Drupal address when we generate the event for Alpaca, but then when we try to post to that URL Drupal says "Oops you want the Proxy (302)`.

This could be a Drupal 8 problem or perhaps we need to look at the use of Url::fromRoute() and reverse proxies. This might require more investigating.

whikloj commented 5 years ago

When you work against Drupal 8 are you accessing the SSL or non-SSL site?

alistairjmcintyre commented 5 years ago

I am accessing Drupal via an SSL site.

So, to try and clarify the exact steps (I think) are happening here:

  1. I access Drupal via the Reverse Proxy. As far my browser is concerned I am accessing via HTTPS. However, SSL/TLS is terminated by nginx on the Reverse Proxy, meaning all traffic that reaches Drupal is HTTP.
  2. I add a Repository Item with type Image to Drupal
  3. I add an Image with type Original File to that Repository Item
  4. Drupal sees an Original File under a Repository Item of type Image and goes to generate a derivative
  5. The derivative is generated by houdini at http://localhost:8000/houdini on the webserver.
  6. Karaf(?) gets the response from houdini containing the derivative file and makes a PUT request to the Drupal endpoint it was given.
  7. The request goes to https://example.nz/the/put/route/here (as an example), which goes through the Reverse Proxy and back to Drupal.

I am more than likely missing some parts here, but this is my understanding of it, it's the 'goes through the reverse proxy' part of Step 7 that feels redundant to me.

kayakr commented 5 years ago

@alistairjmcintyre islandora.media_source_put_to_node is specified in web/modules/contrib/islandora/islandora.routing.yml path: '/node/{node}/media/{media_type}/{taxonomy_term}' That should be a Drupal URL unless there's some rewriting going on?

whikloj commented 5 years ago

@alistairjmcintyre What I think what is happening is:

  1. You access Drupal via the Reverse Proxy at https://localhost (for example) However, SSL/TLS is terminated by nginx on the Reverse Proxy, meaning all traffic that reaches Drupal is HTTP.
  2. You add a Repository Item with type Image to Drupal (say node/3)
  3. You add an Image with type Original File to that Repository Item Drupal sees an Original File under a Repository Item of type Image and goes to generate a derivative
  4. The Event is generated with a post back URL of http://localhost/node/3/media/image/18 and sent to Alpaca (on Karaf).
  5. The derivative is generated by houdini at http://localhost/houdini and sent back to Alpaca.
  6. Alpaca (Karaf) makes a PUT request to http://localhost/node/3/media/image/18
  7. Drupal says "Sorry all requests should go through our reverse proxy available at httpS://localhost/node/3/media/image/18"
  8. Alpaca dies.

So what we probably need to do is handle the 302 better and have Alpaca just try again at the redirected URL.

alistairjmcintyre commented 5 years ago

I'm with you up until about Step 7, except derivatives are definitely being generated and Karaf logs ( /opt/karaf/data/logs/camel.log ) are not showing any errors.

Drupal knows it's behind a Reverse Proxy ( https://medium.com/@lmakarov/drupal-8-and-reverse-proxies-the-base-url-drama-c5553cbc9a3e proved to be an invaluable resource for this ) and as such the base url of the website it knows to be 'https://staging.example.nz', which I guess is the one that Karaf is using.

It seems like it would be more efficient to go directly back to Drupal, without having to jump through the proxy.

whikloj commented 5 years ago

Sorry but in your initial ticket you said that you were getting an error when generating derivatives.

Is this not the case?

On Thu, Mar 28, 2019, 18:13 Alistair McIntyre, notifications@github.com wrote:

I'm with you up until about Step 7, except derivatives are definitely being generated and Karaf logs ( /opt/karaf/data/logs/camel.log ) are not showing any errors.

Drupal knows it's behind a Reverse Proxy ( https://medium.com/@lmakarov/drupal-8-and-reverse-proxies-the-base-url-drama-c5553cbc9a3e proved to be an invaluable resource for this ) and as such the base url of the website it knows to be 'https://staging.example.nz', which I guess is the one that Karaf is using.

It seems like it would be more efficient to go directly back to Drupal, without having to jump through the proxy.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/1070#issuecomment-477804401, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4WCsyN8KERO3tWl3ZZ4GPB01YEHJks5vbUyBgaJpZM4cO5iK .

alistairjmcintyre commented 5 years ago

My apologies for the confusion.

Initially (when I made this ticket) I had an awful, evil nginx config that only allowed HTTP traffic via the webserver to the proxy, without that, I would get the error about a 302 redirect in Karaf.

We've since learned a few things about Reverse Proxying and Drupal 8 that we didn't know, tweaked relevant settings both in nginx and Drupal and the 302s aren't an issue at all now (although something to be aware of for anyone who's going to put Islandora behind a reverse proxy).

The real problem here is Karaf is routing traffic to the wrong place. It shouldn't be routing traffic from Karaf, to the Proxy, then onto Drupal, when Karaf and Drupal are on same machine, but the Proxy is on another.

Currently it's functional, but I don't think it's expected behavior that it would require 2 network hops rather than talking to another service on the same machine.

dannylamb commented 5 years ago

Alpaca is pretty naive when it comes to this. It really doesn't know anything at all about the urls it uses. It's straight up told where to fetch and put everything with info in the message it reads from the queue. We generate that message using a Drupal action, so in theory it's totally possible to monkey with that PUT url to get things right. We'd just have to figure out how best to do it without interfering with non-TLS-terminating setups. Either some hardcoded special case logic or maybe let modules alter the message before it goes to the queue?

BTW thanks for linking that article @alistairjmcintyre, it was super informative. Learn something new every day...

whikloj commented 5 years ago

This seems like a feature which (if the services are on the same machine) would allow you to replace the hostname of the machine with localhost to avoid exiting. I think this is a worth while feature to investigate.