drogonframework / drogon

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows
MIT License
11.62k stars 1.12k forks source link

HostRedirector plugin #1783

Open Mis1eader-dev opened 1 year ago

Mis1eader-dev commented 1 year ago

Host Redirector plugin

Creating a host redirector plugin poses some problems, it can be implemented in multiple ways, each with pros and cons. Before getting into the details, let's establish an environment for testing.

Subnet 1 (192.168.0.0/24)

Hosted by: Router Default gateway: 192.168.0.1 DNS server: 192.168.0.1 -> FORWARD -> 8.8.8.8 Devices connected:

  1. Drogon server: 192.168.0.2
  2. Phone: 192.168.0.3

Subnet 2 (10.10.10.0/24)

Hosted by: Drogon server (standalone machine) Default gateway: 10.10.10.1 DNS server: 10.10.10.1 (Custom DNS server running on the machine) DNS server rules:

  1. example.com -> 10.10.10.1
  2. anything else -> FORWARD -> 8.8.8.8

Devices connected:

  1. Laptop: 10.10.10.2

Scenario - Without Domain Redirector

Now if we connect to the router network using the phone, and navigate to 192.168.0.2, drogon serves the webpage. Navigate to example.com, it serves example.com from the global internet. This is fine, it makes DNS queries to 8.8.8.8.

Then connect to the drogon server network using the laptop, and navigate to 10.10.10.1, again, drogon serves the webpage. Navigate to example.com, drogon also serves the webpage. Remember, we have a custom DNS server on that network that resolves DNS queries of example.com to 10.10.10.1.

Issues

  1. The URL does not look good.
  2. Different cookies for example.com and 10.10.10.1, even though they are the same thing on the 2nd subnet.

Scenario - With Domain Redirector

Same as no redirector scenario for the router network.

But things change for the drogon server network. If you go to 10.10.10.1, drogon will detect you put in an IP address to the server, and will redirect to example.com.

Note how it did not redirect when the request came from the router network for 192.168.0.1, that is because example.com within that subnet does not point to 192.168.0.1, it points somewhere else on the internet. If we had blindly redirected any IP to example.com, then drogon will redirect to the global example.com, away from itself.


Possible solutions

1. Dynamic DNS Querying

Using DNS queries to dynamically obtain the IP address of the domain in question, its theoretical config:

"domain": "example.com",
"subdomain": "www"

This solution is ideal for safety and ease of use from a server developer or maintainer's point of view. It handles requests the following way:

const string &host = ...;
if(!isIP(host))
    return nullptr; // Forward as is

const string &ip = host; // Current requested URL
const string &netInterface = req->networkInterface(); // Which network interface this request came from, e.g. wlan0
string ipOfDomain = dnsQuery("example.com", netInterface); // Make a DNS query on that interface
if(ip != ipOfDomain) // ipOfDomain points to somewhere different, possibly global, unsafe to redirect
    return nullptr;

return HttpRedirectionResponse("example.com");

c-ares may be able to help in this.

Pros:

  1. It only redirects safely, and is consistent.
  2. Does not require server maintainers to fiddle with IP addresses.

Cons:

  1. Implementation of the DNS querying may be difficult if done on the server side.
  2. Adds some latency to IP requests, due to drogon making DNS queries to check validity of the domain and IP.

2. Hardcoded IP Addresses

Having a list of predefined IP addresses, theoretical config:

"domain": "example.com",
"subdomain": "www",
"ip_addresses": [
    "10.10.10.1"
]

This is less ideal, as we have to keep track of what IP addresses we are allowed to redirect. Possible implementation:

const string &host = ...;
if(!isIP(host))
    return nullptr; // Forward as is

const string &reqIP = host; // Current requested URL
bool matched = false;
for(const auto &ip : ipAddresses_)
    if(ip == reqIP)
    {
        matched = true;
        break;
    }

if(!matched)
    return nullptr; // Not in our owned IP list

return HttpRedirectionResponse("example.com");

Pros:

  1. Easier to implement.
  2. Faster redirects, and no latency.

Cons:

  1. Maintaining a list of IP addresses.

Other ideas

If someone has better solutions or have something on your mind, share it with us to discuss.

VladlenPopolitov commented 1 year ago
  1. Back to history: www subdomain is child of 1990th, when web-servers started appearing, and organisations had separate sites names for ftp, http servers (ftp.something.edu, www.something.edu). It became the de facto standard. Browsers automatically tried to add www to any address, if domain did not have http server available on port 80( user enters "panasonic.com", browsers tries panasonic.com, www.panasonic.com). Now www is not used on 100% of sites (see at GitHub.com address, it does to have www). What is the serious reason to make additional redirect, if user already connected the server? If user must connect to www.example.com and example.com is not available, browser automatically tries to use www.example.com . Just not run http server on example.com .
  2. When http server has incoming connection, it is already TCP connection, server does not know, what exactly domain was used (or maybe it was ling with IP address). HTTP portal has "Host" header, but it is information only field, as I know it contain address entered by user in address field of the browser.
  3. How are you going to redirect TCP connection? Are you going to use HTTP redirect? To answer "302 resource temporary moved" to any request regardless the method of the request (GET, POST, PUT, HEADER, TRACE, CONNECT etc)? Every HTTP method has allowed answers from the server.
  4. Are you going to redirect to what port number on new site (the same)? Connect by what protocol on this port (http, https or other)? These are question for you to think about this task. It looks like strange and not clear task. For example apache server has Rewrite rules - they are not redirect, they rewrite the Host parameter content and behave like user asked connection to other URL and even port on the same server (redirect from 80 to 443 port).
Mis1eader-dev commented 1 year ago

Yes we will send back a redirection response with the Location header.

To give more context, see issue #1774

The primary reason for this plugin is cookie availability, and website entry point consistency.

If drogon is listening on 0.0.0.0, then it answers both example.com and www.example.com. But the browser cookies will not be available to both.

Google redirects to https://www.google.com if you type Google's IP address in the URL bar. This allows the cookies to be sent.

Without domain and IP redirects, we will have multiple entry points, and hence different cookies for each entry point, which is far from ideal in a professional setting.

VladlenPopolitov commented 1 year ago

Just for information. google.com and www.google.com - two different sites with different IP addresses. In your example you demonstrate two domain (example.com and www.example.com) on the same IP address (here you created the problem to yourself) and then you can try to solve this problem. And I do not understand, why it is problem, if you have different cookies in different entry points. It is correct behaviour of cookies, browser. Sorry, it is not clear - if everything works as must work, what are you going to change?

Mis1eader-dev commented 1 year ago

Just for information. google.com and www.google.com - two different sites with different IP addresses. In your example you demonstrate two domain (example.com and www.example.com) on the same IP address (here you created the problem to yourself) and then you can try to solve this problem.

We could have two drogon servers listening on different IP addresses: Server 1: 10.10.10.1 Server 2: 10.10.10.2 And have server 2 be the one responsible for taking in example.com and redirect to www.example.com. However, this requires two machines to accomplish this, one with the main server application, and the 2nd one for redirecting. For load balancing purposes it will come in handy. But personally I would like to use only one machine as the server.

And I do not understand, why it is problem, if you have different cookies in different entry points. It is correct behaviour of cookies, browser. Sorry, it is not clear - if everything works as must work, what are you going to change?

If Google didn't redirect IPs to its subdomain, and didn't redirect google.com -> www.google.com, then users have to remember on which one of these endpoints they logged in before, or perform a trial and error to see which one is logged in:

  1. google.com
  2. 172.217.169.110 (google.com)
  3. www.google.com
  4. 142.251.140.36 (www.google.com)

This is inconvenient for Google users if each one of these served the same page without redirecting to the only entry point www.google.com

rbugajewski commented 1 year ago

I think this is a highly specific use case and should be part of a separate 3rd party plugin in its own repository.

You are welcome to let me know once you are finished with the plugin and I can add you to the list of unofficial third-party plugins provided by the Drogon community.

Mis1eader-dev commented 1 year ago

A new idea, we can expand upon the Redirector plugin and have a config entry for it called rules, we can populate it this way:

"name": "drogon::plugin::Redirector",
"config": {
    "rules": {
        "10.10.10.1": "www.my-website.com",
        "my-website.com": "www.my-website.com",
        "ww.my-website.com": "www.my-website.com",
        "image.my-website.com": "images.my-website.com"
    }
}

This will not be a specific use case, accomplishes the same thing, and helps the cookie and single entry point issue.

The "rules" config entry will go through each one and add these to an unordered_map, and when a request comes in, it checks whether the host is within that map, then it replaces it with that entry.

Edit: To reduce duplication, we can have the config in this fashion:

"name": "drogon::plugin::Redirector",
"config": {
    "rules": {
        "www.my-website.com": [
            "10.10.10.1",
            "my-website.com",
            "ww.my-website.com"
        ],
        "images.my-website.com": [
            "image.my-website.com"
        ]
    }
}
VladlenPopolitov commented 1 year ago

Just to get additional idea for you I would point to your example with Google. Google makes redirect as answer to GET request. You also can make controller , that gets a request on the example.com and makes 301 "Permanently moved" like Google does.

For example I tried to send HEAD request to 172.217.169.110:80 via telnet

telnet 172.217.169.110 80
HEAD / HTTP/1.0
[pressed ENTER]
[pressed ENTER]

and Google responded by 200 code with usual answer without any redirect. If Google does not make the complex things and makes simple redirect in only in GET controller, could you consider this option for you. It looks like standard straight-forward way.

VladlenPopolitov commented 1 year ago

@rbugajewski @Mis1eader-dev If I correctly understood idea, it can be said in other words - apache has mod_rewrite https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html , that can change URL and also redirect to other URL. From user point of view, it is done in usual configuration file by regex-like commands (no need to write controllers code or CGI code). I suppose, the idea to have something similar. Sounds interesting. In apache it is done as separate module, maybe it can be done as plugin in Drogon.

Mis1eader-dev commented 1 year ago

Yes that is the feature I want drogon to have, it will attract enterprises if it has the complete feature-set needed for an enterprise server.

an-tao commented 1 year ago

A new idea, we can expand upon the Redirector plugin and have a config entry for it called rules, we can populate it this way:

"name": "drogon::plugin::Redirector",
"config": {
    "rules": {
        "10.10.10.1": "www.my-website.com",
        "my-website.com": "www.my-website.com",
        "ww.my-website.com": "www.my-website.com",
        "image.my-website.com": "images.my-website.com"
    }
}

This will not be a specific use case, accomplishes the same thing, and helps the cookie and single entry point issue.

The "rules" config entry will go through each one and add these to an unordered_map, and when a request comes in, it checks whether the host is within that map, then it replaces it with that entry.

Edit: To reduce duplication, we can have the config in this fashion:

"name": "drogon::plugin::Redirector",
"config": {
    "rules": {
        "www.my-website.com": [
            "10.10.10.1",
            "my-website.com",
            "ww.my-website.com"
        ],
        "images.my-website.com": [
            "image.my-website.com"
        ]
    }
}

I prefer to make a new plugin and keep the redirector plugin clean. Because we think of it as a container for all redirect handlers.

Mis1eader-dev commented 1 year ago

Ok, that sounds fair, what do we call the new plugin?

  1. Rewriter
  2. HostRewriter
  3. IPTerminator
  4. ExterminatorOfURLs
  5. < Cool name like Hodor >
Mis1eader-dev commented 1 year ago

I have started the implementation of the plugin, however, there is more to it than just hosts: maps.example.com -> www.example.com/maps The redirect rule has a path and a different host.

It is also possible some users may want a redirect from paths: www.example.com/images -> images.example.com The request URL match has a host and a path.

This creates an issue, redirector->registerRedirectHandler has a higher priority than SlashRemover's redirect handler redirector->registerPathRewriteHandler, which means the redirect rule: www.example.com/images -> images.example.com

Will not work when the user puts in: www.example.com///images

Edit: For this to work properly, this plugin's execution must sit between Redirector::pathRewriteHandlers_ and Redirector::forwardHandlers_. This means we need to make the Redirector plugin have 4 groups instead of 3, and the 3rd group that comes between these two abovementioned handler groups will have the same function signature as handlers_ (Group 1)'s signature, but without the protocol reference in its parameters.