WordPress / wordpress-playground

Run WordPress in the browser via WebAssembly PHP
https://w.org/playground/
GNU General Public License v2.0
1.61k stars 240 forks source link

Networking: Implement gethostbyname #400

Open adamziel opened 1 year ago

adamziel commented 1 year ago

Problem

As seen in https://github.com/WordPress/wordpress-playground/issues/396, calling gethostbyname in PHP generates a random local IP address. Emscripten then re-routes it to the correct location, but the fact we even have local IP in the mix causes wp_safe_remote_get to fail since it refuses to connect to a local IP address.

A workaround discussed in that issue silences the warning, which works in a dev environment, but for production use we need a proper fix that doesn't open up a possibility of local network traversal.

Proposed solution

Hardcoding host resolution won’t cut it and we need to defer to the OS gethostbyname implementation.

The DNS.lookup() implementation that comes with node.js is asynchronous so there are two ways to go about it:

Unrelated trivia: I just learned that dns.lookup() pretends to be asynchronous, but in reality it calls a blocking function:

https://httptoolkit.com/blog/configuring-nodejs-dns/

The author discusses a cacheable-lookup library that exposes a synchronous lookup() function - perhaps it could „just work” for us?

https://www.npmjs.com/package/cacheable-lookup

Related info

This is also solved by enabling the networking access by adding networking=yes to the URL, for example:

https://playground.wordpress.net/?plugin=create-block-theme&url=/wp-admin/admin.php?page=create-block-theme&networking=yes

Or with the following Blueprint:

{
    "features": {
        "networking": true
    }
}

cc @akirk @dmsnell @danielbachhuber

adamziel commented 7 months ago

Actually, this is only a problem when networking is disabled. Native PHP returns the original input in that scenario:

php > var_dump(gethostbyname('www.example.com'));
string(13) "93.184.216.34"
php > var_dump(gethostbyname('www.example.com'));
string(15) "www.example.com"

The WASM version of gethostname should do the same. At the same time, this seems like a superficial difference since the actual network call won't work anyway unless networking is enabled.

adamziel commented 6 months ago

gethostbyname is still relevant because of the following validation code in wp-includes/http.php:

        if ( preg_match( '#^(([1-9]?\d|1\d\d|25[0-5]|2[0-4]\d)\.){3}([1-9]?\d|1\d\d|25[0-5]|2[0-4]\d)$#', $host ) ) {
            $ip = $host;
        } else {
            $ip = gethostbyname( $host );
            if ( $ip === $host ) { // Error condition for gethostbyname().
                return false;
            }
        }
        if ( $ip ) {
            $parts = array_map( 'intval', explode( '.', $ip ) );
            if ( 127 === $parts[0] || 10 === $parts[0] || 0 === $parts[0]
                || ( 172 === $parts[0] && 16 <= $parts[1] && 31 >= $parts[1] )
                || ( 192 === $parts[0] && 168 === $parts[1] )
            ) {
                // If host appears local, reject unless specifically allowed.
                if ( ! apply_filters( 'http_request_host_is_external', false, $host, $url ) ) {
                    return false;
                }
            }
        }

The local IP address that gethostbyname("s.w.org") returns by default fails the validation and causes WordPress to give up without issuing a request. As a result, the new 6.5 Google Fonts feature doesn't work.