connectrpc / connect-es

The TypeScript implementation of Connect: Protobuf RPC that works.
https://connectrpc.com/
Apache License 2.0
1.27k stars 70 forks source link

Returning or throwing from a method always triggers an abort #1117

Open lourd opened 2 weeks ago

lourd commented 2 weeks ago

Describe the bug

Normally returning or throwing from an RPC method triggers the abort event on the given context's AbortSignal. That's very unexpected. I'm not sure if that's intentional or a bug. Hopefully it's not intended, I don't think it should abort every time no matter what. As it stands now you have to remember to remove every abort listener you may have added to avoid accidental downstream aborting, which is really tedious.

To Reproduce

Environment (please complete the following information):

// test_service.proto

syntax = "proto3";

package repro.v1;

message FooRequest {
  string name = 1;
}

message FooResponse {}

service TestService {
  rpc Foo(FooRequest) returns (FooResponse);
}
// main.js
import {
  createPromiseClient,
  createRouterTransport,
} from "@connectrpc/connect";

import { TestService } from "./gen/repro/v1/test_service_connect.js";
import { FooRequest, FooResponse } from "./gen/repro/v1/test_service_pb.js";

const transport = createRouterTransport((router) => {
  router.service(TestService, {
    async foo(req, ctx) {
      ctx.signal.addEventListener("abort", () => {
        console.log("ABORTED");
      });
      if (req.name === "throw") {
        throw new Error("thrown!");
      }
      return new FooResponse();
    },
  });
});

const client = createPromiseClient(TestService, transport);
await client.foo(new FooRequest({ name: "test" }));
try {
  await client.foo(new FooRequest({ name: "throw" }));
} catch {}

This will end up printing ABORTED twice, once for the first request when it returns normally and once for the second when it throws

lourd commented 2 weeks ago

Seeing the call to abort here https://github.com/connectrpc/connect-es/blob/1d71dc9042572687ed28f7c05928f6fb96ade5aa/packages/connect/src/protocol-connect/handler-factory.ts#L271-L273 added by @timostamm in https://github.com/connectrpc/connect-es/pull/632

srikrsna-buf commented 2 weeks ago

Hey! This was intentional, but you can use the reason property on signal to differentiate whether it was a normal abort or not.

lourd commented 2 weeks ago

Are you open to changing that? That's really confusing and seems like a big pitfall. The case that caused this error for me isn't even aware that it's in a Connect service, it was a couple layers deeper than that and all the sudden is being called after having migrated to Connect from a REST framework. The reason property is also any typed, so comparing that against anything is more of a wish than a guarantee of correctness.

This behavior also isn't documented anywhere. Thinking about how that would be communicated kind of shows how the inherent issue - "The abort signal will be aborted regardless of what happens in your code. It's up to you to check the reason if it was actually aborted or not". Really begs the question of why the library code calling abort() in the first place couldn't do the same thing and not trigger the abort event.

Reading the MDN docs on reason, I really don't think this is proper use. If you don't give a reason when calling abort(), the reason is automatically set to a DOMException. So we would have to check that the reason isn't a DOM exception? This is the behavior in the browser and Node.js

Screenshot 2024-06-26 at 9 17 48 AM

Screenshot 2024-06-26 at 9 58 58 AM

Can you please change this behavior? It appears that the motivation was simply to clear the setTimeout in the deadlineSignal.

https://github.com/connectrpc/connect-es/blob/1d71dc9042572687ed28f7c05928f6fb96ade5aa/packages/connect/src/protocol/signals.ts#L85

Can we change it back to doing that separately? Perhaps making a cleanup method on context that maps to calling deadline.cleanup, and only call that every time instead of abort?

srikrsna-buf commented 1 week ago

The internal cleanup is not the reason for doing this. The signal can be passed to other services and is guaranteed to abort/complete when the rpc is complete (with or without an error). This makes it convenient to pass this signal around to other services. This is particularly useful in fanout scenarios.

Changing this now will certainly be a breaking change, as it changes the behavior. So we can revisit this in v2 (being worked on now). For now, I'd suggest relying on the using keyword added in TS 5.2 (tc39 Stage 3):

const transport = createRouterTransport((router) => {
  router.service(TestService, {
    async foo(req, ctx) {
      using _ = onAbort(signal, () => {
        console.log("ABORTED");
      });    
      if (req.name === "throw") {
        throw new Error("thrown!");
      }
      return new FooResponse();
    },
  });
});

function onAbort(signal: AbortSignal, onAbort: NonNullable<AbortSignal["onabort"]>): Disposable {
  signal.addEventListener("abort", onAbort)
  return {
    [Symbol.dispose]: () =>
      signal.removeEventListener("abort", onAbort)
  }
}
lourd commented 1 week ago

Thanks for the thorough response @srikrsna-buf, I appreciate it. That's a good point about the fanout case, having a completion signal that can be used to cancel in process work when one of the workers has finished responding.

Sounds like there's two different cases this one signal is supporting then - aborting and completing. I think it'd be a good idea to separate those into two separate signals. Like I've said, having the abort signal be responsible for both abortion and completion is a pitfall for any code that only expects to be called when things are actually aborted.

The fan-out case can also be enabled by the user making their own abort controller/completion signal and calling that in their own finally, and linking that to the given abort signal. I imagine that's what most people would do these days since there's no documentation about the given abort signal also being called on completion every time. I think that'd be less complicated than example code you've given about using new TS/JS features to remember to remove an abort event listener.