Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 441 forks source link

Language Extensibility #1319

Open christopheranderson opened 7 years ago

christopheranderson commented 7 years ago

GOALS

Primary Goal

SCENARIOS

BASIC JAVA FUNCTION

A customer needs to run a Functions implemented in Java.

  1. They upload a .jar and function.json into a given Function directory.
  2. The function.json contains a timer trigger and a queue output binding.
  3. The function.json does not contain a scriptFile or entryPoint setting.
  4. The Java based Function implemented an SDK provided by a built-in Java host. When their function is triggered, the host invokes the Function via a built-in Java host environment. The function produces an output for the queue, which the built-in Java host provides the script host.

BASIC NODE.JS FUNCTION

A customer needs to run a Function implemented in JavaScript or which has been transpiled to JavaScript

  1. They upload a .js file and a function.json to a given Function directory.
  2. The function.json contains a just trigger and binding data. It does NOT contain scriptFile or entryPoint.
  3. The .js file exports a single method on an object or provides a default export.

    • [ ] WIP

DEFINITIONS

  1. Runtime/Host – Azure WebJobs SDK Script Host
  2. Language worker/worker – The external process running functions in various languages/frameworks
  3. Message broker – Handles communication between the “host” and the “language worker”

REQUIREMENTS

HIGH LEVEL REQUIREMENTS

Category Priority Requirement Notes
Host 0 Host must be able to send messages to the language worker triggering executions
Host 0 Host must be able to manage the process hosting the language worker.
Host 1 Host should use a mutli-framework/language IDL/API to define message types Recommend using ProtoBuf
Host 1 Host should use a securable protocol with implementations available in a variety of languages Recommend using gRPC
Host 1 Host should be able send data from input bindings (including trigger) to the language worker
Host 1 Host should be able to receive data from output bindings from the language worker
Host 1 Host should be able to receive streaming logs from the language worker
Host 1 Host should provide ability to have “built-in” language workers for various frameworks which load a customer’s Functions from framework specific files
Host 2 Host should provide ability to specify a custom language worker
Host 3 Host should be able to provide graceful warning to the language worker that there is a shutdown
Host 3 Host should be able to request a timeout for a given function running on a language worker
Host 3 The host should be able to receive an ACK that the Function has been terminated safely.
Host 3 If the language worker does not ACK a safe shutdown, the host should perform a shutdown of the language worker.
Portal 1 Portal should handle scriptFile pointing at a file not in a given Functions directory
Portal 1 Portal should handle binary/non-human readable content gracefully

MESSAGE TYPES

SUPPORTED DATA TYPES FOR BINDINGS

BINDING METADATA

INVOCATION LIFECYCLE MANAGEMENT

START

  1. The host should communicate with the language worker to invoke the Function
  2. The language worker should start a function execution when it received a new invocation message

COMPLETE

ERROR

TIMEOUT

  1. A language worker should handle Function timeout/cancelation messages
  2. A language worker should ACK if the Function gracefully timed out based on a configurable grace period
  3. A language worker should provide a means for a Function to be notified of graceful shutdown
  4. The host should send the timeout/cancelation message for a given Function execution to the language worker processing it
  5. A language worker should wait for an ACK for a configurable grace period
  6. The grace period should be configurable via the host.json (timeoutGracePeriod)
  7. The grace period can be overridden per Function via function.json (timeoutGracePeriod)
  8. The default grace period should be 1 second

FUNCTION LIFECYCLE MANAGEMENT

LOAD

UNLOAD

FUNCTION LOGGING

LANGUAGE WORKER LIFECYCLE MANAGEMENT

LANGUAGE WORKER RESOLUTION

LANGUAGE WORKER LOGGING

LOGGING PIPELINES

Categories:

  1. Message broker
  2. Stderr
  3. Crash dump to a logs directory

EVENTS

CATASTROPHIC FAILURE LOGGING

  1. Log to $env:HOME/logFiles/… /functions/worker/{name}/
    • Possibly this is a startup argument as well (makes CLI easy)

HOST LOGGING

EVENTS

christopheranderson commented 7 years ago

Need to address stream support for this: https://github.com/Azure/azure-webjobs-sdk-script/issues/1361

jnevins-gcm commented 7 years ago

So you're going to build another version of .net remoting that's cross platform and cross process? You might want to rethink this

logiclrd commented 7 years ago

@jnevins-gcm Given that one of the requirements is that it be possible to write functions in Java or as Node.js scripts, in addition of course to the current .NET model, all with a common front-end interface & management infrastructure, I am curious to know what you would propose instead. Designing a remoting infrastructure that provides the functionality required in a manner that is easy to implement in a variety of languages seems entirely reasonable to me.

jnevins-gcm commented 7 years ago

It's reasonable but I believe it would be literally a TON of work and I would imagine introduce massive breaking changes. I still think it makes sense to have a .net specific implementation

jnevins-gcm commented 7 years ago

node handles this out of the box already in terms of the way npm supports installing packages

I would propose the following

  1. The ONLY assembly loaded by w3wp is the entry assembly. Have an entry class that will not force loading of any other dlls into the default load context. The only dll in the default load context should be your entry assembly.

  2. For Azure Function runtime direct dependency dlls, LoadFrom (this would not allow resolution by app specific dlls to these dlls) (these direct referenced dlls would need to be in a separate directory from the default directory)

  3. Load ALL app assemblies using LoadFile or Load from byte array (no load context). This would cause ALL assembly resolutions to forward to AssemvbyResolve. You might have special exceptions that resolve to the runtime' own reference (things required for input and output bindings)

  4. Add an AssemblyResolve handler for app assemblies loaded from the app bin directory. Either a. Redirect all version resolve requests to a partial name match in the bin directory (least desirable) b. Add a config for binding redirects in host.json or the like that determines the behavior of the resolve handler so apps that references packages that references a different version of a package referenced by both the app and the direct reference. c. Allow placing of multiple versions of the same assembly version but using different file names (or subdirectories of the app bin directory or a nuget packages directory - I know something similar is done by you now) and handle this in the assembly resolver by actually reading the AssemblyName (not the file name) to determine an exact version match. This handles the scenario where there are truly incompatible breaking changes in version between app referenced assemblies and app dependency referenced assemblies

This isn't SO terribly different from what you're doing now (I've gone through your source in detail)...but I think this would eliminate these problems.

Let me know if this makes sense

paulbatum commented 7 years ago

@jnevins-gcm we appreciate your feedback and thoughts on this. But please keep the assembly binding discussion in the dedicated issue for it, see: https://github.com/Azure/azure-webjobs-sdk-script/issues/992.

Keep in mind that this language extensibility work is aimed at solving a lot more than just binding redirects. If we are going to reuse a bunch of our existing code while supporting functions written in Java, Python and other languages then we simply have to design and build a solution that involves IPC. I would avoid the .NET remoting comparison because it attempted to provide very generic cross-process RPC. The communication channel that would exist between the host process and the language worker process is much more constrained in terms of the contract - it needs to handle a known set of particular events -function invocations, logs, restarts, errors, etc. Don't get me wrong, we know its not a trivial undertaking, but it is by no means as ambitious as .NET remoting was.

jnevins-gcm commented 7 years ago

logiclrd asked me... not sure where else I would have posted my reply..............

Binding redirects (really Assembly binding in general) is a fundamental mess in .NET itself and Azure Functions exacerbates the issue. Many many other languages do not have the core design/architecture problems that .NET does in this area - so I'd focus on getting the most broken scenarios working first.

You're right - .NET remoting is quite complex, but so is this. Maybe not as complex, but it still seems like a bad path to head down.

paulbatum commented 7 years ago

None of your post about assembly loading explains how we would support multiple programming languages with good perf. It's focused on the assembly redirect issue, which is tracked separately and which I linked earlier.

Get Outlook for iOShttps://aka.ms/o0ukef


From: jnevins-gcm notifications@github.com Sent: Monday, June 5, 2017 7:17:02 PM To: Azure/azure-webjobs-sdk-script Cc: Paul Batum; Comment Subject: Re: [Azure/azure-webjobs-sdk-script] Language Extensibility (#1319)

logiclrd asked me... not sure where else I would have posted my reply..............

Binding redirects (really Assembly binding in general) is a fundamental mess in .NET itself and Azure Functions exacerbates the issue. Many many other languages do not have the core design/architecture problems that .NET does in this area - so I'd focus on getting the most broken scenarios working first.

You're right - .NET remoting is quite complex, but so is this. Maybe not as complex, but it still seems like a bad path to head down.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-webjobs-sdk-script%2Fissues%2F1319%23issuecomment-306362329&data=02%7C01%7Cpbatum%40microsoft.com%7C295f9293bd814a53750408d4ac821fab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636323122253584198&sdata=CWhgmd2IcmKWbzv6j40R9gtSRiO6Eyiw4HQQqIKbgCM%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAApp5LjsGBCEUi-ptGZyQc495t7k05Eks5sBLaegaJpZM4MgCDE&data=02%7C01%7Cpbatum%40microsoft.com%7C295f9293bd814a53750408d4ac821fab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636323122253584198&sdata=nYMolZnOPclhMBxq%2FsSXS61qAMgdD%2B%2FmraumMeKGbsQ%3D&reserved=0.

jnevins-gcm commented 7 years ago

Good luck

ErikGrimes commented 7 years ago

Would this enable users to add languages currently not supported by Azure functions (e.g. Go, Ruby, Dart)?

paulbatum commented 7 years ago

Yes, as long as the given language has a grpc + protobuffers implementation.

vjrantal commented 7 years ago

@christopheranderson @paulbatum I would like to ask a question related to the recommendation for gRPC for the IPC. In addition to local IPC, would you foresee that in the future, Functions would benefit from an RPC patterns that goes outside of a single machine, for example, that the host is on a single box and routes messages to worker(s) running on a separate machine?

The background for the question is that I was wondering that for a local IPC, would something like local sockets or named pipes do the trick? I believe one limitation of the current implementation of language extensibility is how large amounts of data is passed from language to another (for example, a big blob from the C# host to a Node.js worker). I think with gRPC you could achieve that by defining an interface where the big blob is sent with small chunks, but not sure how much overhead there would be if the use case is just local IPC.

At https://github.com/vjrantal/named-pipes-test I experimented with named pipes between C# and Node.js in a .NET Core app. Streams of data between the languages worked smoothly and the nice thing about named pipes was that the target worker can affect how much the host would write into the pipe. To be more specific, C# side doesn't write to the pipe unless Node.js has drained it. The end result would be that large blobs would not overwhelm the available memory even if the Node.js worker would be in a state where it would read slower than what the C# side can write.

paulbatum commented 7 years ago

@vjrantal indeed, we've discussed the possibility of the host and the worker being on different machines and this is one of our motivations for choosing gRPC.

logiclrd commented 7 years ago

Was ZeroMQ considered and decided against? My understanding is that it may be able to perform better since it isn't wrapping the message payloads in the HTTP protocol. I also think there might be alternatives ("nanomsg"?) that can use shared memory to get data between processes on the same box with no streaming/copying. I guess the flip side is that some architectures/languages you want to support might not have bindings for such a framework, whereas pretty much everything can do HTTP requests, even if you have to write the message formatter yourself...

paulbatum commented 7 years ago

@logiclrd yes in fact the first prototype we built using this approach was zeromq based. Keep in mind that gRPC is based on HTTP/2, which has very different performance characteristics to HTTP.

Workshop2 commented 7 years ago

Any idea when this will be addressed?

paulbatum commented 7 years ago

@Workshop2 If you're referring to language extensibility using the out-of-proc model discussed above, then the work is well under way. We just performed a major merge of this work into the core branch (see https://github.com/Azure/azure-webjobs-sdk-script/commit/1b6887fb4dc85b9135105621857852087d797860).

Right now we're focused on getting JavaScript working well using this model (see https://github.com/Azure/azure-functions-nodejs-worker).

vtrusevich-incomm commented 3 years ago

Hello, any updates on adding streaming support?