apache / incubator-pagespeed-ngx

Automatic PageSpeed optimization module for Nginx
http://ngxpagespeed.com/
Apache License 2.0
4.36k stars 363 forks source link

PageSpeed redirects to port 8080 with downstream cache configuration #711

Open nikolay opened 10 years ago

nikolay commented 10 years ago

I've reported this along with other issues, but want to put this is a top-level issue now.

I have WordPress configured with Downstream Caching. So Ajax requests (such as post search) incorrectly rewrite the WordPress redirects to have port 8080, which is only used internally behind the caching layer.

This issues is not present in versions before 1.7.

It's a critical issue for us, I've burned a lot of hours trying to work around the issue, but nothing seems to work. Please, help! Seriously!

oschaaf commented 10 years ago

@nikolay I plan to have a look at this tomorrow, could you PM me as much repro information as possible?

nikolay commented 10 years ago

@oschaaf I will email you the latest configuration. Do you use Docker or Vagrant? Maybe I can a box to reproduce it to make your job easier.

oschaaf commented 10 years ago

@nikolay I replied earlier via email, but it isn't showing up here. Not currently using those, but I could install Vagrant. Otherwise a minimal configuration would help greatly.

nikolay commented 10 years ago

@oschaaf Thanks, Otto! I just emailed you my configuration. I did clean up stuff that most probably isn't the root cause. Let me know if you have any questions for me.

oschaaf commented 10 years ago

Double checking the earlier issue with CDN's for 1.8.31.2, that now looks indeed correct to me. So I wouldn't expect trouble from that: http://modpagespeed.googlecode.com/svn/tags/1.8.31.2/src/net/instaweb/automatic/proxy_fetch.cc

void ProxyFetch::HandleHeadersComplete() {
  // If domain rewrite filter is enabled we need to also rewrite the location
  // headers when origin is serving redirects.
  // TODO(matterbury): Consider other 3xx responses.
  // [but note that doing this for 304 Not Modified is probably a dumb idea]
  if (response_headers() != NULL &&
      (response_headers()->status_code() == HttpStatus::kFound ||
       response_headers()->status_code() == HttpStatus::kMovedPermanently)) {
    const char* loc = response_headers()->Lookup1(HttpAttributes::kLocation);
    if (loc != NULL && !driver_->pagespeed_query_params().empty()) {
      GoogleUrl base_url(url_);
      GoogleUrl locn_url(base_url, loc);
      // Only add them back if we're being redirected back to the same domain.
      if (base_url.Origin() == locn_url.Origin()) {
        // TODO(jmarantz): Add a method to GoogleUrl that makes this easy.
        GoogleString new_loc(loc);
        StrAppend(&new_loc, locn_url.has_query() ? "&" : "?",
                  driver_->pagespeed_query_params());
        response_headers()->Replace(HttpAttributes::kLocation, new_loc);
        response_headers()->ComputeCaching();
      }
    }
  }

I have not tried to reproduce yet, but looking at your configuration, perhaps this explains what is happening: When rewriting the origin's content, I think that PSOL will assume that the document that it is rewriting resides on www.example.com:8080. When for some reason a url needs to be made absolute instead of relative, it will add the host:port of the origin, which is wrong in this scenario.

You might be able to confirm this by configuring this:

pagespeed PreserveUrlRelativity off;
pagespeed DisableFilters trim_urls;

With that, I'd expect a lot of .pagespeed. urls pointing to port 8080.

I think you could correct that with something like this in the origin's configuration:

pagespeed EnableFilters rewrite_domains;
pagespeed MapRewriteDomain www.example.com www.example.com:8080;
# This should also help with rewriting Location headers:
pagespeed DomainRewriteHyperlinks on;

My hopes are up that this will make PSOL rewrite redirects and links in html/css to point to the domain it's being served from to the browser (and what the reverse proxy is listening to). I think that a better fix in ngx_pagespeed would be to support a header like X-Forwarded-Port (and perhaps also X-Forwarded-Host for another related use-case).

However, trouble would still arise if wordpress somehow serves up a javascript file that contains urls pointing to :8080 (PSOL won't rewrite that).

nikolay commented 10 years ago

@oschaaf Thanks! Why I didn't think of MapRewriteDomain earlier?! I will test it right away and it seems like a solid defensive technique until https://github.com/pagespeed/ngx_pagespeed/issues/716 is released.

nikolay commented 10 years ago

@oschaaf @jeffkaufman The only issue I see with pagespeed MapRewriteDomain www.example.com www.example.com:8080; workaround is that a HTTPS request (port 443) might be rewritten to port 8080, then, port 8080 gets rewritten to port 80, i.e. we'll lose original HTTPS. I assume we can't use variable like MapRewriteDomain similar to other PageSpeed statements?

oschaaf commented 10 years ago

@nikolay script variables aren't supported yet, but perhaps you would be able to cook something up by passing on the original protocol from the reverse proxy in request headers, and using if statements on the backend around that? Maybe another potential workaround could be switching to https only, and always rewrite to that.

nikolay commented 10 years ago

@oschaaf I see. Unfortunately, the MapRewriteDomain did not work. As I changed the port from 8080 to 8050 in Nginx config and now it's still redirecting to that. Also, as this is WordPress, in some cases, it has to be HTTP, not HTTPS, and I really don't know all cases where this would be needed so that I can use conditional rewrite based on where it should be HTTP or HTTPS.

Is there any other workaround you can think of?

oschaaf commented 10 years ago

@nikolay possibly related: https://github.com/pagespeed/ngx_pagespeed/issues/819#issuecomment-58568648