Open ZirconCode opened 1 year ago
pls share the function name,app name,instance id,timestamp ,region etc.
function name: all, since the deploy doesn't work app name: as in the logs above, ae-API-compute, but also debugApp123 instance id: azure function instance id? (i.e. ExecutionContext.InvocationId?), not relevant since it is a deploy error timestamp: for some examples see logs above region: west europe
etc.: I also wish I could provide the relevant information to isolate the error, however the error messages have not allowed me to do so.
Setting PYTHON_ISOLATE_WORKER_DEPENDENCIES:1
also does not resolve the issue.
pls create the support request on azure portal https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request
As mentioned previously, I am already in contact with azure support.
Tried:
Ran into two new bugs while isolating:
11:35:41 AM ae-api-compute-secondary: Starting deployment...
11:35:41 AM ae-api-compute-secondary: Creating zip package...
11:46:39 AM: Error: socket hang up
I've been using .funcignore
and disabling functions to try to isolate the problem within the scope of my larger project, since it was not possible to do so from a clean project upwards. Both these things seem to invite a host of new issues.
The combination of all these issues makes it impossible to work. It is very disappointing.
I have isolated and reproduced reliably one of the vague errors listed above:
4:00:49 PM: Error: The operation was aborted.
This specific reproducible isolated case was fixable with:
PYTHON_ISOLATE_WORKER_DEPENDENCIES:1
(couldn't find documentation to link to)v1
or v2
programming model folder structure and probably recent undocumented changes to behavior in this direction.Also a note that all my logfiles
are still non-existent on failed deployment. This shouldn't be the case.
So, the above error
2:39:54 PM ae-api-compute-secondary: Writing the artifacts to a Zip file
2:40:18 PM: Error: The operation was aborted.
came back when including further pieces of my project.
I isolated it to the line import tempfile
. For some reason this causes the error. It worked previously.
This also causes the abort when I have it in a default httptrigger template function by itself.
I have discovered a new non-reproducible randomly appearing bug:
2:57:46 PM ae-api-compute-secondary: Starting deployment...
2:57:46 PM ae-api-compute-secondary: Creating zip package...
2:58:02 PM ae-api-compute-secondary: Zip package size: 180 MB
2:58:04 PM ae-api-compute-secondary: Fetching changes.
2:58:06 PM ae-api-compute-secondary: Cleaning up temp folders from previous zip deployments and extracting pushed zip file /tmp/zipdeploy/1455b335-1904-468e-8ed3-3384bf99dbe4.zip (0.00 MB) to /tmp/zipdeploy/extracted
2:58:06 PM ae-api-compute-secondary: Central Directory corrupt.
2:58:13 PM ae-api-compute-secondary: Deployment failed.
I'm not even going to try to figure that one out.
I isolated the next reason for abort
to including openai
in requirements.txt
(no importing). During the pip install the there are no errors and deploy seems to be satisfied, however it aborts at the end.
So, I have solved the deployment issue as a final step, at least for me, by using a specific AUR and deploying from the terminal instead of the azure extension:
azure login
func azure functionapp publish appName --slot slotName
Interestingly, the deployment zip is around 200mb smaller, though both do a remote build.
I will keep this issue open because I think it highlights the need for better/existent logging and error feedback in many cases. The above combination of steps got my project deployable again, however I will likely never know what was broken, and why it happened without my agency, and good luck to anyone with similar issues.
In case it helps others: Check the environment variable names you are using. They might conflict with Azure specific variable names and thereby cause errors. In my case no http triggers were found (failing silently), because of the the environment variable CONTAINER_NAME
.
Very annoying and impossible to debug. Please add verbose error messages to the deployment output!
For almost a year now, there has been such a bug in Azure Functions for Python that does not allow them to be used profitably, and which consists in displaying the message "No HTTP triggers found" at the end of deployment from VS Code, despite the fact that the function code works correctly.
At this link you will find the desperate attempt of developers to report the anomaly in an issue in the GitHub repository of Oryx: https://github.com/microsoft/Oryx/issues/1774
Rightly, Paul Dorsh responds after some reporting on this issue that the problem is not with Oryx (used to build the code), but the problem is in the deployment part to the Azure function: https://github.com/microsoft/Oryx/issues/1774#issuecomment-1509093908
Paul points to the vscode-azurefunction repo, where someone takes the report and fixes the bug, but only for Node.js, not Python: https://github.com/microsoft/vscode-azurefunctions/issues/3805
So much so that Simon from Zirconcode is urged to open this issue in the azure-functions-python-worker repo: https://github.com/Azure/azure-functions-python-worker/issues/1306. It was opened in mid-August, and as of today (mid-December) has still not been fixed, despite the severity of the bug.
One of the latest report, which always came in the first issue thread on Oryx, reports the following:
This bug forced me to remake the entire app for AWS Lambda. Nothing in this thread worked sadly. I prefer Azure but what can we do when something is just completely broken with no concrete solution.
(from https://github.com/microsoft/Oryx/issues/1774#issuecomment-1835920134)
I myself had to implement a feature for a major customer using an Azure function with Python, and I sweated my nuts off trying to figure out what was wrong (from 1 day of implementation, it took me 5!).
Is it possible that this serious bug cannot be addressed in a reasonable time?
Unsure how to title this. It's been a week of debugging and isolating with no results. Is this issue in the correct repo? Also unsure. Help me out here. I'll walk you through my journey.
I have a very large code base, it runs flawlessly locally. I've deployed it just fine for a long time until recently. I made a change, including
google-cloud-texttospeech
inrequirements.txt
. It stopped deploying after this (maybe relevant, maybe not). Removing this change, to the exact same code base as before, still fails to deploy.Some errors I get at random, and I've tried incredibly hard to isolate. Both development environments can deploy other things successfully, and push to a new app fine as well. I have not made any local environment changes at all when it began to fail.
Details: Azure Functions runtime version 4.24.4.4, Linux premium plan with Elastic Premium EP3. I deploy directly from visual studio code azure function extension. My python code follows the folder structure of the v1 coding model (no decorators, lots of folders), however I am on v2 (host.json etc.), this has always worked, and runs locally of course. Deploying to python 3.9.7.
First development environment: Manjaro linux, vscode 1.81.1, azure extension 1.12.3
Errors I've encountered seemingly at random, when trying to deploy:
Except it was not successful and functions are empty / don't run (the zip file in webjobstorage contains them thought).
And a very exciting shiny rare one:
Second development environment: Windows, visual studio code, azure extension v1.12.4. I've also used this environment before, and it has worked, same as the above one.
Errors:
One successful deploy, and randomly most of the above (with no changes), as well as a new one:
Some other random things I've tried:
What now? The errors are not helpful. Logs are missing, I can drill down into events, insights, many different logs, they are all over the place and they are all useless or empty. I've mentioned that I've had multiple 2hr+ calls with the technical support team of various ever-increasing escalations, and they are just as stumped as me (and I am grateful for their efforts).
Any thoughts?
What do I try next? Any information I can provide?
See also (maybe relevant, I don't know at this point): https://github.com/microsoft/vscode-azurefunctions/issues/3805 https://github.com/microsoft/vscode-azurefunctions/issues/2529 https://github.com/microsoft/Oryx/issues/1774 https://stackoverflow.com/questions/76478668/adding-python-module-google-cloud-storage-is-causing-a-working-azure-function-ap https://stackoverflow.com/questions/72441758/typeerror-descriptors-cannot-not-be-created-directly https://github.com/projectkudu/kudu/issues/3348 https://github.com/microsoft/azure-pipelines-tasks/issues/14201 https://github.com/microsoft/vscode-azurefunctions/issues/2529 https://github.com/microsoft/Oryx/issues/1774