Closed tamathew closed 3 years ago
Hi @cgillum - Here is the log for af450c5387364f1d8aee5325e31a4522 orchestratorLog_af450c5387364f1d8aee5325e31a4522.xlsx
Can you check this
Those orchestrator logs are interesting because it looks like the same activity function was scheduled to run twice.
2/15/2021, 6:37:37.393 PM | af450c5387364f1d8aee5325e31a4522: Function 'ADFMetaExtractorActivity (Activity)' started. IsReplay: False. Input: (636 bytes). State: Started. SlotName: Production. ExtensionVersion: 2.3.1. SequenceNumber: 5. TaskEventId: 0 |
---|---|
2/15/2021, 6:32:36.702 PM | af450c5387364f1d8aee5325e31a4522: Function 'ADFMetaExtractorActivity (Activity)' started. IsReplay: False. Input: (636 bytes). State: Started. SlotName: Production. ExtensionVersion: 2.3.1. SequenceNumber: 12. TaskEventId: 0 |
The difference between these two log statements is 5 minutes, which is also the visibility timeout on the queues. I also see that these two log statements were generated by two different VMs, so I wonder if the first VM picked it up, got stuck or killed, and then a second VM picked up the message 5 minutes later and executed the activity function immediately. I'll need to spend more time digging into this to see what happened exactly, but it so far appears to be a different issue from what we've been investigating up until now.
Hi @cgillum - What is the fundamental difference in way .NET/C# Durable Function is being executed vs Python Durable function being executed ? Can you explain in plain English ..lol :)
@cgillum - Do you have any update on this issue ?
What is the fundamental difference in way .NET/C# Durable Function is being executed vs Python Durable function being executed ?
Hey @tmathewlulu the fundamental difference is with the underlying runtime itself. In .NET/C#, an app can use multiple threads to process concurrent requests. .NET will also automatically add or remove threads as needed. Python, however, was designed to just use one thread at a time. If you need anything to run concurrently in Python, then you need to manually configure it to have more threads.
The way most triggers work in Azure Functions is that they try to execute multiple requests concurrently. This works great for C#/.NET because .NET will happily create multiple threads if needed to handle all the concurrent requests. Sadly, Python does not do this, so Function invocations get blocked waiting for a free thread to start executing your code. The workaround for this is to try and configure the Azure Functions trigger so that it doesn't try to take on more work than the Python worker can handle at any given time.
I hope that makes sense. Let me know if I can help clarify further.
Do you have any update on this issue?
So I think we're now talking about two different things. One is the concurrency behavior of Python functions. I think we've sufficiently covered this topic as far as I can tell and that there aren't any known issues requiring further investigation.
The other, regarding instance af450c5387364f1d8aee5325e31a4522 where I saw a duplicate execution, it looks like the container your app was running on was terminated mid-execution at 2021-02-15 18:32:57.0817379. This can happen if, for example, the platform is going through an upgrade or if a scale-in operation was scheduled. I'll need to follow up with the Azure Functions Consumption Linux team to understand what the exact cause was. It's my understanding that you've opened a support request already so I'll pass this information along so that they can do a root cause analysis.
chrisFunction_d7743cf04f854dd3a65d6ae2a95493a7.xlsx
@cgillum - Python concurrency issue - This issue persists. The code you shared is not working. In the attached log, the orchestrator started at 2/22/2021, 9:49:27.026 PM and activity started at 2/22/2021, 9:56:35.935 PM There is a 7 min delay. Why ??
@tmathewlulu what were your values for PYTHON_THREADPOOL_THREAD_COUNT
, FUNCTIONS_WORKER_PROCESS_COUNT
, maxConcurrentActivityFunctions
, and maxConcurrentOrchestratorFunctions
?
@cgillum - This is the setting I used. I also tried without the below setting, also turned the activity to async + asyncio.sleep "FUNCTIONS_WORKER_PROCESS_COUNT": 1, "PYTHON_THREADPOOL_THREAD_COUNT": 10,
What about the other two concurrency settings in host.json?
"FUNCTIONS_WORKER_PROCESS_COUNT": 1, "PYTHON_THREADPOOL_THREAD_COUNT": 10, "extensions": { "durableTask": { "maxConcurrentActivityFunctions": 5, "maxConcurrentOrchestratorFunctions": 5 } Also async qualifier for activity function and asyncio.sleep for sleep and ran 15 parallel iterations with various sleep intervals - To my surprise all jobs completed as expected.
I got greedy and executed my original business requirement of running snowflake sql sleep commands and is not behaving as expected. It has the long running issue.
I will have to do more investigation by tweaking settings.
Hi @tamathew!
It appears this issue hasn't gotten activity in a bit so I was wondering if it was resolved for now. Just asking as part of our regular GitHub maintenance and clean-up work. Thanks!
Hi @davidmrdavid - I did not get a time to do my final round of testing. Will update this thread once I'm done.
Hi @cgillum @davidmrdavid - I'm intermittently seeing issues with Azure Python durable function. However I rewrote my code in chsarp to execute Snowflake SQL queries and it worked well. So we decided to go with csharp runtime for now.
Hi @tamathew - Sounds good. Glad to know you got unblocked at least. As Durable Python continues to mature, I'm sure we'll see more use-cases with ADF and Snowflake SQL so do check back-in in the future. I'll be closing this issue for now. Thanks!
I'm running Snowflake SQLs via ADF->Az function activity which calls a Python Durable Az function. When I tested with a long running SQL : "call system$wait(9, 'MINUTES)", it ran beyond 9 min.. and I aborted the job at 35th minute. The status of the statusQueryGetUri is below
Output -
The output log from Azure Monitor- DurableActivity however shows that there was an outcome of the SQL. But for some reason it was not getting updated in the webhook --> statusQueryGetUri.
This issue is also intermittent. Please let me know if you need more info.
2021-01-26 19:57:30.261 query: [call system$wait(9, 'MINUTES')] Information 2021-01-26 20:06:30.355 query execution done Information 2021-01-26 20:06:30.357 Output of the SQL execution : waited 9 minutes Information
DurableOrchestrator code -