RFC: Multi-worker development

threepointone commented 2 years ago

We announced service bindings in November last year, and the APIs in production went GA last month. You can now "bind" workers and make calls between them. So, for example, let's say you have 2 workers named my-worker and another-worker, you can have a wrangler.toml file that looks like this:

name = 'my-worker'

[[services]]
binding = 'ANOTHER'
service = 'another-worker'
# optionally, `environment` too

And then in your Worker:

export default {
  fetch(req, env, ctx){
    return env.ANOTHER.fetch(req);
  }
}

That's good. This also works with wrangler dev, but with caveats. This issue tracks those caveats and tradeoffs, enhancements and solutions.

Local mode

Service bindings do not work with wrangler dev --local mode at the moment. For the above example, we want to be able to run wrangler dev --local on both my-worker and another-worker, and for env.ANOTHER.fetch(req, env, ctx) from my-worker to make a request to another-worker running in local mode. Importantly, the goal here is to do so without additional configuration. Tracking this here https://github.com/cloudflare/wrangler2/issues/1040 (previous work explored by miniflare called mounts https://miniflare.dev/core/mount)

Remote mode

If you run wrangler dev with the above example, it'll "work". But one has to be mindful that it's bound to a production Worker (or whatever other environment it's bound to), which means you could corrupt data while developing and experimenting locally. What we probably want here is to also run wrangler dev on another-worker, and then have requests from my-worker make a request to another-worker running in remote mode dev. Again, the goal here is to do so without additional configuration.

Mixed mode

A common model for development of multiple services in the wider ecosystem, is to actually not have all services be on the same development model at all. For example, consider an e-commerce site with 5 mini "apps": home, search, product details, cart, and checkout; with different teams working on each of them, potentially with different stacks altogether. For this discussion let us just say they're 5 different Workers. As a developer working on the home app locally, perhaps you want search and product details to be pointing to production Workers, cart and checkout to a staging environment, while home itself is in --local mode because you want to iterate quickly. This is what I'm calling "mixed mode" development. This will probably be a complicated one to crack. While the goal here is to do so without additional configuration, it'll probably need some extra configuration during dev, though I'm not sure yet what that will look like.

Implementation-wise, I suspect we'll have to implement a service registry of sorts. So whenever wrangler dev starts up, it'll register itself in-memory with the service registry. A Worker would also be wrapped with a facade worker (kind like we do for --experimental-public with https://github.com/cloudflare/wrangler2/blob/main/packages/wrangler/templates/static-asset-facade.js) that feeds values into env; so in the above example, during dev env.ANOTHER would be an object that conceptually looks something like this -

{
  fetch(req){
    return serviceRegistry.get('another-worker').fetch(req);
  }
}

(Similarly for "service worker" format workers, we'll have to override addEventListener to provide a facade. This is extremely oversimplified, but I hope the model's clear.)

This is simpler with local mode, so that's probably what we'll do first.

For remote mode, it's trickier because every request will have to have special headers added to it (like we already do in https://github.com/cloudflare/wrangler2/blob/main/packages/wrangler/src/proxy.ts). And because we get fresh values for preview tokens on every change, we'll have to keep refreshing the state in the service registry. A bit fiddly, but that's what it'll take.

For mixed mode, we'll have to figure out some configuration, that will probably live under [dev]. I have no ideas what that will look like just yet, but maaaybe it'll be [dev.services]? Or something. We'll talk about it more once we have local and remote modes.

Other discussion points that come to mind and will probably become separate issues once we start having clarity:

How does Pages fit in here? The usecase that comes to mind, is how do you have a Pages project that can be bound to a regular Worker? Can a regular Worker be bound to a Pages project? And how does development look like there?
Testing is something we should start thinking of. What does an integration test look like across multiple workers? Or even a unit test with a binding? Can we make it easy for folks to mock bound workers?
It's not clear yet how we can do atomic deploys of bound workers. We should follow up internally on this, but the question is how can we publish multiple workers at the same time and be assured there's no deployment skew amongst them?

GregBrimble commented 2 years ago

Pages has two types of relevant bindings:

Durable Object namespace bindings
Service bindings (coming soon)

The service bindings one sounds like it'll be well covered by what you've described above. Directly binding to a Durable Object namespace is a bit more complicated though. Basically, we need to reference a Workers Service somewhere, and specifically a DO class in there. The unofficial syntax we've been trying out for a while has been --do BINDING=CLASS@./path/to/service and then using Miniflare's mounting system. Very open to other, better options, as a result of this work.

Related issue and long-standing draft PR.

its-jman commented 2 years ago

One thought I had here is that I would like to be able to switch between versions. Sometimes although I have wrangler dev running in two related workers, I might be working on separate projects in each.

I'd be concerned if wrangler made too many assumptions about which project I want to reference at a time.

Webpack module federation takes an approach where you can switch out bundles (local vs remote vs ...), but it's an explicit action you have to take. You could set the defaults though.

garretcharp commented 2 years ago

Is there anything on your pipeline for multi-worker deployment? (something similar to say the AWS CDK where it lets you deploy many lambdas together)

huw commented 2 years ago

Something that’s annoying me a bit with multi-worker development is trying to get them to bind. It seems like (and correct me if I’m wrong!) that if A depends on B, you have to spin up B first, then wait for it to load before spinning up A (I’m struggling to figure out how much has to load before it works, too). It would be nice if they bound automatically if I spun up B later.

huw commented 2 years ago

More generally, I preferred the config-based approach that Miniflare took. I don’t really want to spin up multiple processes & would prefer to emulate service bindings as closely as possible, especially since this is what’s going to run in production. I’m not sure if Miniflare 3 will fix this or not.

petebacondarwin commented 2 years ago

@huw - have you tried the wrangler dev --local experience with multi-workers? @threepointone implemented quite a nice setup, where each worker that you spin up locally will register itself with a secret local registry. Then when a worker has a service binding to another worker it will check this registry on each request and use the locally running one if available. If not, then it will return a helpful response to the caller:

"You should start up wrangler dev --local on the XXX worker"

With this approach you can start your workers in any order.

While this is not the final solution to the problem, it is a great improvement on trying to do this in remote dev mode.

petebacondarwin commented 2 years ago

Also if you want to try to get closer behaviour to Workers deployed in the cloud you can use the --experimental-local flag (instead of --local) which will use the OSS workerd runtime rather than the Miniflare node.js Worker emulator.

huw commented 2 years ago

@petebacondarwin Sorry, yes, that’s what I was giving feedback on. For some reason it’s not consistently binding & certainly not giving me that error when it doesn’t. It’s good to know that’s the intended behaviour—I will look into it further on my end :)

petebacondarwin commented 2 years ago

FWIW I have had some problems when I have none of the service-binding Workers up and running at all. Instead of telling me that they are all missing, I just get a blank page. Is this what you are seeing? I find that as long as at least one binding is running before I try to make a request to the top level Worker it has been working well for me.

petebacondarwin commented 2 years ago

If you could create a reproduction of the problem that would be awesome!

huw commented 2 years ago

I couldn’t reproduce, but I suspect we must be seeing the same thing. If it’s an intermittent problem I can accept that 😌. I do get this when I run the parent before the child:

✘ [ERROR] local worker: Error: waitForPortToBeAvailable() aborted

      at AbortSignal.<anonymous>
  (<dir>/node_modules/wrangler/wrangler-dist/cli.js:141024:26)
      at [nodejs.internal.kHybridDispatch] (node:internal/event_target:731:20)
      at AbortSignal.dispatchEvent (node:internal/event_target:673:26)
      at abortSignal (node:internal/abort_controller:292:10)
      at AbortController.abort (node:internal/abort_controller:322:5)
      at <dir>/node_modules/wrangler/wrangler-dist/cli.js:142128:23
      at jh (<dir>/node_modules/wrangler/wrangler-dist/cli.js:24371:15)
      at exports2.unstable_runWithPriority
  (<dir>/node_modules/wrangler/wrangler-dist/cli.js:20537:16)
      at Pc (<dir>/node_modules/wrangler/wrangler-dist/cli.js:21152:16)
      at Qg (<dir>/node_modules/wrangler/wrangler-dist/cli.js:24337:18) {
    code: 'ABORT_ERR'
  }

But then it says:

⎔ Shutting down local server.
⎔ Starting a local server...

…and after that it works fine. This is before I even make a call to either worker, so I guess they’re communicating over the wire fine. Weird. I’ll let you know if I see anything else while I try and get environments working, which could’ve been the cause of the first issue.

penalosa commented 1 year ago

Looks like this has all been implemented as part of the dev registry work, so I'm closing this for now (we'll open a new issue for any future improvements)

threepointone commented 1 year ago

is remote mode and mixed mode implemented?

petebacondarwin commented 1 year ago

is remote mode and mixed mode implemented?

Nope, not yet.

petebacondarwin commented 1 year ago

@penalosa - if you want to create separate issues for these modes, then we can close this one.

marbemac commented 1 year ago

Is the dev registry approach supposed to support:

Invoking bound services from the scheduled handler
Adding messages to queues

In my testing I've been able to get service <-> service requests working between two local workers, but can't seem to get invoking from scheduled handler or attempting to send messages to a queue to work.

patryk-smc commented 9 months ago

Wrangler is super CPU intensive, running 3 instances for 3 bound workers and my M2 Macbook Air CPU is 95C. With 10 bound workers I guess I will have to put it on cold packs. Is there anything that can be done to prevent this?

davidhousedev commented 7 months ago

Another dynamic related to mixed mode is utilizing services that may require local or remote development. While this might be a transient issue, I am currently have trouble utilizing both Browser Rendering and Queues in my application. Browser Rendering requires --remote development and Queues requires --local development.

If I was able to run my Browser Rendering worker with a service binding to my Queue producer worker, this would allow me to stay on the cutting edge of utilizing Cloudflare's products.

cloudflare / workers-sdk