Azure / azure-functions-python-worker

Python worker for Azure Functions.
http://aka.ms/azurefunctions
MIT License
331 stars 100 forks source link

[BUG] Transient "ModuleNotFoundError: no module named 'function_app'" under function load #1398

Closed ajstewart closed 3 months ago

ajstewart commented 5 months ago

I'm reporting this issue as something seems wrong in terms of the actual starting of the function. If the error was true all the time the function would never run, where as it does complete successfully when the load is light.

I've recently updated the Function to 3.11 as well so not sure if it's on that version or others.

Investigative information

Repro steps

Unfortunately I can only reproduce by putting my particular function under load.

The function (which is an experimental one) contains a call that can suffer from long response times (~3 mins sometimes) and 502 gateway errors under load. I'm not sure if this is masking or causing the error that I'm reporting here, though I suspect they are linked given what I'm seeing.

Expected behavior

I expect the function to load normally as it does under light load. Or fail with raised HTTP response or timeout.

Actual behavior

Under a load of about 1000 queue messages to churn through (a typical response should take around 20-30 seconds) I am transiently getting the error below. It seems pretty fundamental and perhaps points towards a new instance failing to start correctly? Most of the requests do go through ok, but I have to re-send through some messages from the poison queue for processing to finish on all of them.

When the load is light I do not encounter any errors.

The function is not optimised or ideal but I still expect the load to be processed given that each task should only fail with bad gateways or maybe timeouts at a stretch - maybe this is masking that and the error is a red herring but I thought I would report it in any case.

Exception: ModuleNotFoundError: No module named 'function_app'. Troubleshooting Guide: https://aka.ms/functions-modulenotfound
Stack:   File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 384, in _handle__function_load_request
    _ = self.index_functions(function_path)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 617, in index_functions
    indexed_functions = loader.index_function_app(function_path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/utils/wrappers.py", line 48, in call
    raise extend_exception_message(e, message)
  File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/utils/wrappers.py", line 44, in call
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/loader.py", line 214, in index_function_app
    imported_module = importlib.import_module(module_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Known workarounds

Don't subject the function to high loads.

Contents of the requirements.txt file:

Don't want to share but invocation id is above.

gavin-aguiar commented 5 months ago

@ajstewart I see that you have enabled the FUNCTION_ISOLATE_WORKER_DEPENDENCIES flag. Do you need this flag enabled? If not, could you disable this flag and try again?

ajstewart commented 5 months ago

Thanks for the reply, ah yes from my other issue (#1339) I know that can spell trouble.

I'll turn it off and get back to you, I'll need to build up some events to process.

ajstewart commented 5 months ago

Ok so I repeated the test without PYTHON_ISOLATE_WORKER_DEPENDENCIES and indeed this has stopped these particular errors. I'm only seeing the function timeouts and 502 from my other service.

Should this setting come with bit more of a warning? Like with the other issue I linked to I seem to run into a lot of problems because of it, when originally in the documentation it made it sound like it should avoid such conflicts. But it just seems to create more.

I'll be conscious to avoid this setting in the future.