caddyserver / replace-response

Caddy module that performs replacements in response bodies
Apache License 2.0
98 stars 27 forks source link

Replace with reverse proxy #6

Closed dvfabbri closed 3 years ago

dvfabbri commented 3 years ago

Trying to make this work for a reverse proxy. It appears this module is specifically designed for the HTTP server (I am new to this project, so please excuse me if I am mistaken). To port this to run "replace" for the reverse proxy, can the transform function be moved into the reverse proxy or do you recommend a new module be created? (I am new here, so please excuse stupid questions, thanks!)

mholt commented 3 years ago

Not necessary, it works fine with the proxy. What have you tried?

dvfabbri commented 3 years ago

Thanks for the quick response. To preface, my objective is to insert a script after the <head> tag on each response (i.e., ).

Here are the steps I have followed, with various issues along the way:

  1. When I use the following Caddyfile for google.com, I get a "The requested URL / was not found on this server." error. Unsure how to get normal behavior here.
    
    localhost

reverse_proxy www.google.com:443 { transport http { tls tls_insecure_skip_verify } }


2. Going to a website I control that is hosted behind Cloudflare results in a '403 forbidden' error. Looks like I might need a Cloudflare token. Is that correct?

localhost

reverse_proxy www.maizeanalytics.com:443 { transport http { tls_insecure_skip_verify

    }

}


3. For yahoo, I don't get the main yahoo page (I get Yahoo error page), but I am able to make the replace work for text and the title:

![Screen Shot 2021-08-09 at 3 51 00 PM](https://user-images.githubusercontent.com/4060672/128772706-3b95d4e5-aa89-4892-a511-06ba26d27959.png)

{ order replace after encode }

localhost

reverse_proxy yahoo.com:443 { transport http { tls_insecure_skip_verify } }

replace {

Yahoo Yahoo2
right left

}


4. I also tried on a site on EC2 with lets encrypt certs. I get the same Yahoo error page as before using `./caddy reverse-proxy --from https://piem.io --to https://yahoo.com`, but do not get that error page when I use this Caddyfile. Specifically, I do not see the directive to set the server_name correctly, which causes the JSON from the command and the JSON from caddyfile to differ.

:443

reverse_proxy yahoo.com:443 { transport http { tls tls_server_name piem.io } }

francislavoie commented 3 years ago

When trying to proxy to an upstream over HTTPS, you need to set the Host header to the value they expect for it to work correctly.

reverse_proxy https://google.com {
    header_up Host {http.reverse_proxy.upstream.hostport}
}
dvfabbri commented 3 years ago

That gives curl: (35) error:14004438:SSL routines:CONNECT_CR_SRVR_HELLO:tlsv1 alert internal error. I see other posts mentioning -default-sni but don't see how to configure that for reverse proxy.

dvfabbri commented 3 years ago

This config now gets me to yahoo correctly, but the URL does not stay the same as my reverse proxy URL (ie the browser bar shows yahoo.com). Also not seeing the search term replaced.

   "apps":{
      "http":{
         "servers":{
            "proxy":{
               "listen":[
                  ":443"
               ],
               "routes":[
                  {
                     "handle":[
                        {
                               "handler":"replace_response",
                               "replacements":[
                                  {
                                     "replace":"Sports2",
                                     "search":"Sports"
                                  }
                               ]
                            },
                        {
                           "handler":"reverse_proxy",
                           "headers":{
                              "request":{
                                 "set":{
                                    "Host":[
                                       "{http.reverse_proxy.upstream.hostport}"
                                    ]
                                 }
                              }
                           },
                           "transport":{
                              "protocol":"http",
                              "tls":{
                                 "server_name":"yahoo.com"
                              }
                           },
                           "upstreams":[
                              {
                                 "dial":"yahoo.com:443"
                              }
                           ]
                        }
                     ],
                     "match":[
                        {
                           "host":[
                              "piem.io"
                           ]
                        }
                     ]
                  }
               ]
            }
         }
mholt commented 3 years ago

This config now gets me to yahoo correctly, but the URL does not stay the same as my reverse proxy URL (ie the browser bar shows yahoo.com).

The site is issuing a redirect to www.yahoo.com.

Also not seeing the search term replaced.

I'm not sure why that's the case. Will have to look into that more.

mholt commented 3 years ago

Oh, duh -- it's because yahoo.com is encoding the response into a binary format (gzip in this case). See the Content-Encoding header.

dvfabbri commented 3 years ago

Interesting. Are you suggesting that by adding a handle for encoding then the replace can work, or that by using gzip, replace is not possible?

If possible, is this what you are referring to?

"routes":[
                     {
                     "handle": [
                                        {                       
                                        "encodings": {                  
                                                "gzip": {}                      
                                        },                              
                                        "handler": "encode"             
                                        }                       
                                ],
                     "handle":[
                        {                               
                               "handler":"replace_response",    
                               "replacements":[                         
                                  {                                             
                                     "replace":"Sports2",               
                                     "search":"Sports"                  
                                  }                             
                               ]
                            },
mholt commented 3 years ago

I don't know what that config is... edit: oh, you're trying to encode a gzip response? That doesn't seem right.

The response body would have to be decoded before plaintext replacements can be performed (then re-encoded). I don't know an efficient way to do that.

dvfabbri commented 3 years ago

I guess the only option would be to buffer the response, unzip, and then replace (which could be slow).

If you have other recommendations on how to inject a js scrpit, let me know (or just limit myself to non zipped pages).

mholt commented 3 years ago

I mean, we could implement a decoding functionality into this module... or perhaps even as a separate handler (similar to the encode handler, in fact maybe we could even use the encode handler but we'd just need to reverse the "encoders" to be "decoders")... but yeah it would not be super efficient.

Now I'm curious what an un-gzip encoder would look like for the encode handler.

dvfabbri commented 3 years ago

I presume this is also the case for br encoding.

mholt commented 3 years ago

Yeah, and re-encoding that would really suck for performance.

francislavoie commented 3 years ago

Maybe you could set header_up Accept-Encoding identity to override what the browser asks for so that you get unencoded back from the upstream, then you can encode gzip in Caddy afterwards to at least compress it back to the browser

dvfabbri commented 3 years ago

That worked!

mholt commented 3 years ago

That's brilliant, ngl.

ser commented 1 year ago

header_up Accept-Encoding identity

I think this should be in the main README file as I was struggling to understand why replace does not work for reverse proxy.

mholt commented 1 year ago

@ser (It is in the readme. I added it again to the top if that helps.)