Azure / azure-functions-python-worker

Python worker for Azure Functions.
http://aka.ms/azurefunctions
MIT License
331 stars 100 forks source link

Aiohttp requests timing out in coroutines #1311

Closed kmtechsupport closed 6 months ago

kmtechsupport commented 10 months ago

Hi team,

I am experiencing an issue where I am using asyncio.gather to await roughly 100 coroutines. These co routines do a simple api and then return a data frame. My code runs perfectly fine on my local environment but in my azure function about 10% of my coroutines return timeout errors (which I am using the default ClientSession() timeout of 5 minutes). These co routines are quick and only take about 10 seconds each.

The code for my function performing the actual https request is something like the following (this function is used in the coroutines)

async def GetData(input, session):

    requestPayload = {
        'name': input
    }

    requestData = json.dumps(requestPayload)
    async with session.post(ENDPOINT, data=requestData) as resp:
        return await resp.json()  

Any advice or links to useful resources would be much appreciated.

bhagyshricompany commented 10 months ago

Thanks for reporting pls share all the repro steps and req.txt file

YunchuWang commented 10 months ago

@kmtechsupport Hi, please share your app name and invocation id, sku type, etc. This could be worker memory overloaded or network congestion, etc. With more details, we can take a closer look.

kmtechsupport commented 10 months ago

@kmtechsupport Hi, please share your app name and invocation id, sku type, etc. This could be worker memory overloaded or network congestion, etc. With more details, we can take a closer look.

App name : https://getsentinelthreats.azurewebsites.net (this is the sandbox app being used for testing) Invocation Id of one of the failed instances: 4446776e217755dfcb12b5943c9bd841

Let me know if you need more info

kmtechsupport commented 10 months ago

Thanks for reporting pls share all the repro steps and req.txt file @bhagyshricompany Requirements:

azure-functions
azure-functions-durable
pandas
requests
numpy
aiohttp
asyncio
xmltodict

Steps to reproduce is something like the following:

async def GetData(input, session):

    requestPayload = {
        'name': input
    }

    requestData = json.dumps(requestPayload)
    async with session.post(ENDPOINT, data=requestData) as resp:
        return await resp.json()

async def main(mytimer: func.TimerRequest, starter: str) -> None:
    async with ClientSession() as session:
            try:
                backupData = await asyncio.gather(
                *[
                        GetData(input, session)
                        for item in items
                    ],
                    return_exceptions=True
                )
            except Exception as e:
                logging.info(repr(e))
kmtechsupport commented 10 months ago

@kmtechsupport Hi, please share your app name and invocation id, sku type, etc. This could be worker memory overloaded or network congestion, etc. With more details, we can take a closer look.

App name : https://getsentinelthreats.azurewebsites.net (this is the sandbox app being used for testing) Invocation Id of one of the failed instances: 4446776e217755dfcb12b5943c9bd841

Let me know if you need more info

@YunchuWang Not sure if it's worth mentioning but the issue appears pretty much as soon as the coroutines exceeds 5. But is inconsistent so sometimes I'll get 50 successes back sometimes 10.

Another thing maybe worth mentioning is I have also tried changing this to a durable function with the orchestrator spawning these coroutines as activity and functions and it exhibits the exact same behavior. At a certain point the activity functions simply start timing out.

I have also tried chunking the input by not spawning all 100 coroutines at once but rather processing the array in chunks of 5. I have tried this method in both the single script and the durable function methods and they both do not fix the issue.

kmtechsupport commented 10 months ago

@bhagyshricompany any luck with this one?

bhagyshricompany commented 9 months ago

@gavin-aguiar pls comment

kmtechsupport commented 9 months ago

@bhagyshricompany will this be looked into or do I need to explore other solutions?

bhagyshricompany commented 8 months ago

@vrdmr pls comment.

YunchuWang commented 6 months ago

sorry for late reply. Taking a look now.

YunchuWang commented 6 months ago

@kmtechsupport i am unable to repro it with 20 concurrent coroutines (each sleep for 5 seconds) in a linux consumption app.

import azure.functions as func
import logging
from aiohttp import ClientSession
import asyncio

app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)

async def sleep_and_print(session: ClientSession, i: int) -> None:
    await asyncio.sleep(5)
    logging.info(f"Hello from {i}")

@app.route(route="http_trigger")
async def http_trigger(req: func.HttpRequest) -> func.HttpResponse:
    async with ClientSession() as session:
        await asyncio.gather(
            *[
                    sleep_and_print(session, i)
                    for i in range(20)
                ],
                return_exceptions=True
            )

        return func.HttpResponse(f"Hello,This HTTP triggered function executed successfully.")

from the worker side, i dont observe any blocking issues from timing out concurrent tasks. the api endpoints may suffer availiablity issue? Can you try some the code above? (sorry cant find any logs internally for the app https://getsentinelthreats.azurewebsites.net/ anymore as it has been too long)

microsoft-github-policy-service[bot] commented 6 months ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.