Azure / azure-functions-python-worker

Python worker for Azure Functions.
http://aka.ms/azurefunctions
MIT License
335 stars 103 forks source link

Durable Functions Support for Python #227

Closed asavaritayal closed 4 years ago

asavaritayal commented 5 years ago

New Feature - Looking for votes and/or user input to gauge traction.

priyaananthasankar commented 5 years ago

+1 Absolutely needed. Working on a small sample with asyncio to simulate something similar.

gled4er commented 5 years ago

+1

yokawasa commented 5 years ago

+1 Yes needed! Python has many choice of data science libraries and it definitely has long running & stateful scenario!

tjhgit commented 5 years ago

I am experimenting with triggering an azure batch job from an azure function. So this is completely asynchronous and you can also scale up the compute ressources easily in azure batch.

mataralhawiti commented 5 years ago

+1 Absolutely

stereobutter commented 5 years ago

+1 for long running analytics/ML jobs

crgarcia12 commented 5 years ago

+1

Sarah-Aly commented 5 years ago

This is absolutely a needed feature. I do have a customer who uses Python in Azure Functions for manipulating CSVs & creating time series with Pandas and they are an ideal case for function chaining in DF.

t-eckert commented 5 years ago

+1 Happy to contribute

asavaritayal commented 5 years ago

/cc @cgillum @kashimiz as FYI

nroypf commented 5 years ago

+1 Need it asap :)

dajor commented 5 years ago

+1

FlippieCoetser commented 5 years ago

currently running large data transformations with Python. Execution time is around 30 min. Having durable functions will help a lot!

ericdrobinson commented 5 years ago

I'm saddened to see that Python support for Durable Functions isn't available yet. I've chronicled my efforts to make my usage of the HTTPTrigger more effective/efficient for my workflow in the comments of issue #236. The documentation, however, keeps suggesting that I take the Durable Functions route for a more reasonable experience.

Unfortunately, it looks as though I'll need to re-implement Durable Functions manually if I want to gain the suggested benefits of that approach :(

+1, indeed.

explora21 commented 5 years ago

+1, of course

SpicySyntax commented 5 years ago

+1, This would be very useful for long running ML Workloads

reynoldsa commented 5 years ago

Absolutely +1.

bigdatamoore commented 5 years ago

+1 agreed. We could use this now.

jaryder commented 5 years ago

+1 - Is there a timeline for when this will be available?

svartkanin commented 5 years ago

+1 - Definitely needed

whataride commented 5 years ago

+1 - Absolutely needed!

rodbutters commented 5 years ago

+1 - this is a MUST HAVE!

In the mean time - @asavaritayal is it possible to use javascript durable functions for a python function app?

deepbass commented 5 years ago

This could pretty much replace spark for me, and make the dev cycle much faster because i personally find spark is much harder to write and debug than normal python (java stack traces inside python ones)

Also good for the kind of stuff i would do in powershell, but much prefer python as a scripting language

shanjin14 commented 5 years ago

+1

Godase commented 5 years ago

+1 - Absolutely needed!

nk-gears commented 5 years ago

Happy to contribute and looking for some update on this. Not sure whether this repo https://github.com/kashimiz/azure-functions-durable-python is official for Durable functions

rcarmo commented 5 years ago

Oh please, please, PLEASE. I've been trying to do some things in Node and there are no libraries to handle what I need.

rodbutters commented 5 years ago

This is truly needed for production deployment - if only to catch container errors. Equivalent functionality is available as step functions elsewhere. If there is an alternative way to catch container errors (memory limit, timeout) that would solve the basic problem.

Zieg commented 4 years ago

+1 100% Absolutely necessary! Especially for long running ETL and data processing.

Looking forward to contribute to the functionality whenever possible

anujku commented 4 years ago

I guess this is not going anywhere even after a year ! :(

dduransseau commented 4 years ago

As @Sarah-Aly I do have some use case where pandas is used to manipulate multiple DataFrame and where workflow could be simplified with Durable Function.

KonoMaxi commented 4 years ago

+3 (voting on behalf of my lazy colleagues)

We use Python Functions for small ETL processes with pandas. Durable functions would help to very elegantly structure the individual steps.

I wrote a small python package that handles chaining for me... It transparently sends messages to queues, which preceeding jobs listen to, and monitors the execution status in an storage-account table.

All in all it works, but it feels real dirty :-(

Zieg commented 4 years ago

I wrote a small python package that handles chaining for me... It transparently sends messages to queues, which preceeding jobs listen to, and monitors the execution status in an storage-account table.

All in all it works, but it feels real dirty :-(

Can you share it? I'm really interested in how this might be achieved, and quick-and-dirty code is nothing to be ahsamed of =)

ubikusss commented 4 years ago

+1

rkaderli commented 4 years ago

++ Yes, please!

KonoMaxi commented 4 years ago

Can you share it? I'm really interested in how this might be achieved, and quick-and-dirty code is nothing to be ahsamed of =)

See gist. Source for jobmanager + short readme + usage examples.

https://gist.github.com/KonoMaxi/b63f184bad7ffccbdcc4d818da7b6ee9

sasukeh commented 4 years ago

+1

anirudhgarg commented 4 years ago

Thank you very much for your interest. I can confirm that we have started working on this and based on current estimates it looks like we should be able to get a beta out sometime early next calendar year. Please stay tuned for more udpates and we would love for people on this thread to be able to give us feedback.

Zieg commented 4 years ago

See gist. Source for jobmanager + short readme + usage examples.

https://gist.github.com/KonoMaxi/b63f184bad7ffccbdcc4d818da7b6ee9

Awesome! Thanks a lot! (Also - your code is not bad or dirty at all!!!)

fervalverde commented 4 years ago

+1 ! happy to contribute if needed. Actually i was thinking to use JS for Durable Functions and trigger my python non-durable functions from it with HTTP requests, anyone have used this or a better approach? Thanks in advance!

anthonychu commented 4 years ago

As @anirudhgarg mentioned, we've started work on supporting Durable Functions. We would love to hear more details about what patterns you plan on using and what you'll be doing with it.

shanjin14 commented 4 years ago

As @anirudhgarg mentioned, we've started work on supporting Durable Functions. We would love to hear more details about what patterns you plan on using and what you'll be doing with it.

Hi Anthony, for my use case, i am calling Azure Function in ADF for some data processing related task . So two patterns i am plan on using is Async API and function chaining. Mainly to avoid the 2.5 minute API timeout as the file is too big or the job takes longer time to finish

deepbass commented 4 years ago

As @anirudhgarg mentioned, we've started work on supporting Durable Functions. We would love to hear more details about what patterns you plan on using and what you'll be doing with it.

I'd be looking at the usual patterns that I'd do with Durable Functions in C#/JS but being able to take advantage of Python's much larger ecosystem of ML/AI libraries - e.g. loading documents and then running NLP models over it to extract information before sending the results on to another service/human. To me it will also move a few workloads from DataBricks PySpark - generally filling the gap where the demands are beyond an individual function but really don't need the firepower & faff of a full Spark Cluster - bringing improved reliability, more straightforward & dependable programming and serverless operation.

It'd be interesting to see how parallel training of an ML model could be achieved using something like durable entities combined with overriding the threading behaviour and instead spinning up activity functions. No idea if that would work or if the latency cost would be too high but would be cool to investigate.

bigdatamoore commented 4 years ago

We could also use it for the same patterns as Daniel mentioned. There’s a ML pattern called the Rendezvous pattern that this would be helpful for.

On Sat, Nov 23, 2019 at 6:52 AM Daniel Bass notifications@github.com wrote:

As @anirudhgarg https://github.com/anirudhgarg mentioned, we've started work on supporting Durable Functions. We would love to hear more details about what patterns https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview you plan on using and what you'll be doing with it.

I'd be looking at the usual patterns that I'd do with Durable Functions in C#/JS but being able to take advantage of Python's much larger ecosystem of ML/AI libraries - e.g. loading documents and then running NLP models over it to extract information before sending the results on to another service/human. To me it will also move a few workloads from DataBricks PySpark - generally filling the gap where the demands are beyond an individual function but really don't need the firepower & faff of a full Spark Cluster - bringing improved reliability, more straightforward & dependable programming and serverless operation.

It'd be interesting to see how parallel training of an ML model could be achieved using something like durable entities combined with overriding the threading behaviour and instead spinning up activity functions. No idea if that would work or if the latency cost would be too high but would be cool to investigate.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-functions-python-worker/issues/227?email_source=notifications&email_token=ACMHXIVUDKOAXJJFKFEARUDQVEKQNA5CNFSM4F6IKWZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7TSNQ#issuecomment-557791542, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMHXISGNDFIDBR6VKZPXR3QVEKQNANCNFSM4F6IKWZQ .

Zieg commented 4 years ago

Chaining, Async API and Long-running functions are the most needed in our project, Thanks!

P.S Is there a way to be involved in the project early-on? Would really like to participate and contribute to it

fervalverde commented 4 years ago

In my project, we would use the Fan out/fan in pattern to orchestrate the execution of different Functions which extract data from different sources, then it's evaluated to check for security settings and finally it's loaded to a database. Actually we're doing this orchestration with logic apps but we would prefer to move to ADF to get advantage of stateful and easy to maintain than logic apps.

rcarmo commented 4 years ago

I would love to do a similar pattern with Python on Linux Functions and add some statistical processing with scipy. Eventually, even evaluating ensembles of Tensorflow models...

On 27 Nov 2019, at 10:21, Korven Dallas notifications@github.com wrote:

 In my project, we would use the Fan out/fan in pattern to orchestrate the execution of different Functions which extract data from different sources, then it's evaluated to check for security settings and finally it's loaded to a database. Actually we're doing this orchestration with logic apps but we would prefer to move to ADF to get advantage of stateful and easy to maintain than logic apps.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mattc-eostar commented 4 years ago

Potential workaround: Create a python function app with a custom linux image or host it on a premium plan (I think this switches billing to cost per sec vs cost per execution). This should remove the timeout right?

Then within that function app use async requests and manually call the functions you need for your distributed computing. The custom image on premium will run as long as needed and will be able to orchestrate the calls.

Would this not work?

mattc-eostar commented 4 years ago

@tjhgit

I am experimenting with triggering an azure batch job from an azure function. So this is completely asynchronous and you can also scale up the compute ressources easily in azure batch.

@priyaananthasankar

+1 Absolutely needed. Working on a small sample with asyncio to simulate something similar.

Any luck in these adventures?

Zieg commented 4 years ago

@mattc-eostar not really. Depends on how you trigger your function. If you use http trigger - then you are limited by 4 minutes timeout. Function should execute and return output within this limit, otherwise it fails. And if you want to trigger your function from Data Factory (for example) - you are pretty much limited to http triggers.