Azure / azure-functions-durable-js

JavaScript library for using the Durable Functions bindings
https://www.npmjs.com/package/durable-functions
MIT License
128 stars 47 forks source link

Can we improve the `callActivity()` and `callSubOrchestrator()` experiences in orchestrations? #418

Closed hossam-nasr closed 11 months ago

hossam-nasr commented 1 year ago

(see relevant discussion: https://github.com/ejizba/func-nodejs-prototype/pull/14#discussion_r1051046886)

Currently, if an orchestrator wants to call an activity function, it would have code like this:

context.df.callActivity('Hello', 'Cairo')

passing in the activity's name as a string. This provides no intellisense and no validation to confirm that the activity name is the same as the registered activity function.

Similarly, suborchestrations are started by specifying the orchestration's name as a string:

context.df.callSubOrchestrator("DeviceProvisioningOrchestration", deviceId, child_id)

This issue is to explore whether the new programming model can provide us an opportunity to improve this callActivity() and callSubOrchestrator() experience, and whether such an improvement would be worth the engineering cost.

An example of this would be to have the API to register activities (df.app.activity()) and orchestrations (df.app.orchestration()) return some kind of object that can be passed to the callActivity()/callSubOrchestration() functions, as such:

const helloActivity = df.app.activity(activityName, { handler: (input: string) => {
       return `Hello, ${input}`;
   }
});

const orchestrator: OrchestrationHandler = function* (context) {
    const outputs = [];
    outputs.push(yield context.df.callActivity(helloActivity, 'Tokyo'));
    outputs.push(yield context.df.callActivity(helloActivity, 'Seattle'));
    outputs.push(yield context.df.callActivity(helloActivity, 'Cairo'));

    return outputs;
};
df.app.orchestration('durableOrchestrator1', orchestrator);
ejizba commented 1 year ago

I'm wondering if it makes more sense to flip the order from this:

context.df.callActivity(helloActivity, 'Tokyo')

To something like this:

helloActivity.call(context, 'Tokyo')
hossam-nasr commented 1 year ago

@ejizba I would be in favor of that solution, and I don't immediately see a reason we couldn't do this

davidmrdavid commented 1 year ago

I think that's a really cool idea, @ejizba.

I would take it a step further and ask whether it would be possible to call it like this:

yield helloActivity.call('Tokyo') // let's not forget the `yield` is necessary.

Or taking it to the logical extreme and just do

yield helloActivity('Tokyo') // let's not forget the `yield` is necessary.

Overall, I think that hiding the context argument, which would be static for a given orchestrator, would be a neat win. Given that the SDK controls the orchestrator execution (and therefore it has a local copy of the context), I think it ought to be possible.

hossam-nasr commented 1 year ago

@davidmrdavid I really like your last proposal 😍 Makes writing orchestrations one step closer to feeling like writing just normal code. Once/if we also fix #5, this would basically look exactly the same as a normal async function

jviau commented 1 year ago

How would this affect code organization? If I want to access helloActivity in the above example in other files, how would I expose that? And would this also work for orchestrations? Would I be able to use:

const subOrchestrator = df.app.orchestration('subOrchestrator', /* impl */);

const orchestrator: OrchestrationHandler = function* (context) {
    const outputs = [];
    outputs.push(yield context.df.callOrchestration(subOrchestrator , 'Tokyo'));
    outputs.push(yield context.df.callOrchestration(subOrchestrator , 'Seattle'));
    outputs.push(yield context.df.callOrchestration(subOrchestrator , 'Cairo'));

    return outputs;
};
df.app.orchestration('durableOrchestrator1', orchestrator);
davidmrdavid commented 1 year ago

@jviau: let me try to answer your questions in parts.

(1)

If I want to access helloActivity in the above example in other files, how would I expose that?

I think a user would be able to export and import the helloActivity object just like they do in regular JavaScript. I wanna say it would look like this, minus a few typos:


// fileA.js

const helloActivity = df.app.activity("MyActivityName", /* impl */ );
export helloActivity;

// fileB.js
import { helloActivity } from "fileB.js"; // or something like that

So I think the code organization bit is not the difficult part, we would just be re-using standard mechanisms in the language. I think the tricky bit to get right is not requiring an explicit reference to the context object when scheduling an Activity on a given orchestrator, but I think that's solvable.

(2)

And would this also work for orchestrations?

I think I need more context here. Can you please elaborate what you're asking?

jviau commented 1 year ago

I think I need more context here. Can you please elaborate what you're asking?

My code sample shows what I am asking. Can I take the same approach with using the return value of df.app.orchestration(...) to call sub orchestrations.

davidmrdavid commented 1 year ago

Yes, I think so. I don't see why this would be specific to Activities. I think it would apply to Activities, and SubOrchestrators. I still need to think about how this would work with entities, which allow for fire-and-forget ops and I'm not sure how I want to control for that.

So you could have an orchestrator like this:

const subOrchestrator = df.app.orchestration('subOrchestrator', /* impl */);
const helloActivity = df.app.activity(activityName, { handler: (input: string) => {
       return `Hello, ${input}`;
   }
});

const orchestrator: OrchestrationHandler = function* (context) {
    const outputs = [];
    // the context is preserved by the implicit TaskOrchestrationHandler that receives each `yield`
    outputs.push(yield helloActivity ('Tokyo'));
    outputs.push(yield subOrchestrator('Seattle'));

    return outputs;
};
df.app.orchestration('durableOrchestrator1', orchestrator);
jviau commented 1 year ago

Fire & forget on sub-orchestrations too. I think the returned object subOrchestrator and helloActivity here, should take an options object as their second parameter - similar to what I did for the new dotnet SDK: https://github.com/microsoft/durabletask-dotnet/blob/main/src/Abstractions/TaskOrchestrationContext.cs#L110

const subOrchestrator = df.app.orchestration('subOrchestrator', /* impl */);
const helloActivity = df.app.activity(activityName, { handler: (input: string) => {
       return `Hello, ${input}`;
   }
});

const orchestrator: OrchestrationHandler = function* (context) {
    const outputs = [];
    // the context is preserved by the implicit TaskOrchestrationHandler that receives each `yield`
    outputs.push(yield helloActivity('Tokyo', { retry: /* add retry details */ }));
    outputs.push(yield subOrchestrator('Seattle', { retry, instanceId, fireAndForget })); // mind my rough / invalid JS here

    return outputs;
};
df.app.orchestration('durableOrchestrator1', orchestrator);
davidmrdavid commented 1 year ago

I like it, and thanks for the callout on fire-and-forget subOrchs.

davidmrdavid commented 1 year ago

@ejizba: any feedback on this most recent suggestion? Essentially, we're making the invocation of activities, entities, and sub-orchestrators be as minimal as possible by removing the context prefix and the API names themselves - it's just the Function objects.

ejizba commented 1 year ago

Makes writing orchestrations one step closer to feeling like writing just normal code.

I just want to push back on this goal because orchestrations aren't normal code, right? In which case I do think it's helpful to have some differentiating characteristics

Overall, I think that hiding the context argument, which would be static for a given orchestrator, would be a neat win. Given that the SDK controls the orchestrator execution (and therefore it has a local copy of the context), I think it ought to be possible.

I'm not entirely following this and it's my biggest confusion with the newer code samples. Why was context required before and why is it magically gone now?

jviau commented 1 year ago

I agree on hiding the context not being that beneficial. I think there is a point to keeping context there, to show that orchestrations, while they try to leverage the language to be as natural as possible, are still a work dispatching mechanism. The main benefit I would see is getting rid of the stringly-typing as much as possible. I personally think the mediator pattern matches perfectly with orchestrations. In this pattern, you have a single object that will pull double duty of both describing what the target is (which activity or orchestration) and be (or supply) the input itself to that activity or orchestration.

  1. The df.app.* methods would return a function which can be invoked to return some object
  2. The returned object would describe the request to run an orchestration/activity/entity. We can use generics here to strongly type the input and return value.
  3. context.df would then need only a single dispatch method on it, which is overloaded to accept the different objects return by the df.app.* methods.

This example puts it together in what I hope is valid (or at least semi valid) code.

interface OrchestrationRequest<TInput, TResult> { // TResult lets the context.df.dispatch be strongly typed for its return value
    Name: string;
    Input: TInput;
}

// these types are different only to satisfy different overloads of `context.df.dispatch(..)`
interface ActivityRequest<TInput, TResult> { // TResult lets the context.df.dispatch be strongly typed for its return value
    Name: string;
    Input: TInput;
}

const subOrchestrator = df.app.orchestration('subOrchestrator', { handler: (context: ???, input: MyInput) => { /* impl, no return */ }}); // subOrchestrator is `function (input: MyInput): OrchestrationRequest<MyInput, Void>`
const helloActivity = df.app.activity(activityName, { handler: (input: string) => { // helloActivity is `function (input: string): ActivityRequest<string, string>`
       return `Hello, ${input}`;
   }
});

const orchestrator: OrchestrationHandler = function* (context) {
    // the context is preserved by the implicit TaskOrchestrationHandler that receives each `yield`
    string result = yield context.df.dispatch(helloActivity("input")); // due to the `TResult` of `ActivityRequest`, we know the result is `string` here.
    yield context.df.dispatch(subOrchestrator({ /* instance of MyInput */ }), { retry, instanceId, fireAndForget }); // due to the `TResult` of `OrchestrationRequest`, we know the result is `Void` here.

    return result;
};

df.app.orchestration('durableOrchestrator1', orchestrator);

The Mediator pattern is what I intend to implement in the C# SDK eventually. Also I am not married to dispatch - could be run, send, whatever fits

hossam-nasr commented 1 year ago

Makes writing orchestrations one step closer to feeling like writing just normal code.

I just want to push back on this goal because orchestrations aren't normal code, right? In which case I do think it's helpful to have some differentiating characteristics

Personally, I find that one of the biggest selling points about durable is that it makes calling serverless functions and managing their state feel like calling regular functions and assigning regular variables. Imo, the closer orchestration code feels to normal code, the better. But I don't think this is a risk anyway since all the generator yield stuff makes it pretty obvious this isn't real code.

Overall, I think that hiding the context argument, which would be static for a given orchestrator, would be a neat win. Given that the SDK controls the orchestrator execution (and therefore it has a local copy of the context), I think it ought to be possible.

I'm not entirely following this and it's my biggest confusion with the newer code samples. Why was context required before and why is it magically gone now?

Well, I don't think context was really required before, it was just where the API lived, but context properties weren't necessary for producing the return value of callActivity() -- this work was done when the resulting Task is yielded. The asterisk here is that context properties (namely the task executor) is required for callActivityWithRetry(), but there are ways to make it also not required. I actually was able to get a very hacky prototype of the SDK to run the below orchestration:


df.app.orchestration('durableOrchestrator1', function* (context) {
    const outputs = [];
    const retryOptions = new df.RetryOptions(1000, 2);
    outputs.push(yield helloActivity({ input: 'Tokyo', retryOptions }));
    outputs.push(yield helloActivity({ input: 'Seattle', retryOptions }));
    outputs.push(yield helloActivity({ input: 'Cairo' }));

    return outputs;
});

const helloActivity = df.app.activity(activityName, {
    handler: (input) => {
        return `Hello, ${input}`;
    },
});
davidmrdavid commented 1 year ago

Good to see we have a prototype :)

As for whether DF should or shouldn't feel like regular code - that's tricky and a potentially very long running conversation, so I want to make sure we don't get lost in the meta conversation here. To keep my response short: there is definitely value in highlighting that DF isn't regular code, due to code constraints and such. However, I also want to let the Node team evolve the programming model in a way that suites them best so I don't want to constraint them to how other PLs decided to convey Activity/Entity/SubOrch dispatching.

To me, the question is whether that is already conveyed well enough through the yield keyword which already calls attention to something different taking place.

I would like more thoughts on this latest prototype. I'll ask in our internal demo chat to get a few more eyes on this thread, I'd love to see what others think.

jviau commented 1 year ago

Question on this line of code:

 outputs.push(yield helloActivity({ input: 'Tokyo', retryOptions }));

Which of the two following ways is more idiomatic in JS/TS? (Legit asking - I don't know myself).

// option 1: input is always first param, options (for configuring retry, etc) is an optional second param.
function someActivity(input: TInput, options?: TaskOptions): TResult // is this how you do optional parameters in TS?

// option 2: two overloads. 1 for just input, 1 for a complex object which combines input and options
function someActivity(input: TInput): TResult // only input, no options.

function someActivity(options: TaskOptions<TInput>): TResult // input is a part of options object.
ejizba commented 1 year ago

Option 1 is more common and yes that's how you do optional params. This is assuming a few things: that input is always required, that it's integral to the functionality of someActivity, and that it's the only argument that fits that bill.

Overloads get messy fast. Here's an example where I'm passing in the wrong type of argument (a number) to an overloaded and a non-overloaded function and you can see the error for the overloaded function is about 6 lines long and the error for the non-overloaded function is 1 line long. Screenshot 2023-01-27 at 10 33 09 AM

jviau commented 1 year ago

This is assuming a few things: that input is always required

I think all of our suggestions are moving towards a strongly-typed function generated from the df.app.* registration method. So input would only be required if your handler delegate defines an input. We could always allow for an overload of const activity = df.app.activity('myActivity', () => { /* no input */ }) which would return activity as function activity(options?: TaskOptions): TResult - so there is no required input, and options remains optional.

hossam-nasr commented 11 months ago

Closed by #535