cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers
https://blog.cloudflare.com/workerd-open-source-workers-runtime/
Apache License 2.0
6.16k stars 293 forks source link

🐛 Bug Report — ports on Windows #1664

Open penalosa opened 7 months ago

penalosa commented 7 months ago

On Windows, workerd silently ignores Address in use errors when trying to bind to an already occupied port.

Using the following capnp:

using Workerd = import "/workerd/workerd.capnp";

const config :Workerd.Config = (
  services = [
    ( name = "main", worker = .worker ),
  ],
  sockets = [
    ( name = "http", address = "127.0.0.1:8080", http = (), service = "main" ),
  ]
);

const worker :Workerd.Worker = (
  modules = [
    ( name = "index.mjs",
      esModule =
        `export default {
        `  async fetch(request, env, ctx) {
        `    return new Response("body");
        `  }
        `}
    )
  ],
  compatibilityDate = "2023-02-28",
);

When running two workerd processes, the first successfully binds to 8080, and is accessible from a browser. The second workerd process also reports that it's successfully bound to 8080, but is not accesible from a browser.

On macOS or Linux, the second workerd process crashes with:

*** Fatal uncaught kj::Exception: kj/async-io-unix.c++:945: failed: ::bind(sockfd, &addr.generic, addrlen): Address already in use; toString() = 127.0.0.1:8080

cc @mrbbot @RamIdeas

kentonv commented 7 months ago

This probably has to do with the SO_REUSEADDR socket option, which I'm told has a different meaning on Windows vs. Linux.

On Linux, this option doesn't actually allow a port to be bound multiple times. Instead, it merely avoids the need to wait 5 minutes after a server releases the port before a new server can bind it. Just about everyone always wants to use this option on Linux because otherwise you have 5 minutes of downtime whenever your server crashes. Therefore, KJ sets this on all sockets by default.

I had always assumed it had the same meaning on other platforms. But it seems that on Windows, this option apparently allows multiple processes to bind to the same port. I'm a little unclear on the exact behavior, and I don't have a Windows box to test on. I have no idea what the effect is on MacOS.

Probably KJ needs to change to not set this option on Windows. But someone needs to verify that it won't lead to the 5 minute downtime issue. And someone needs to check if there's a problem on MacOS.