Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 441 forks source link

v3 Azure Functions running old code after successful Bitbucket CI/CD deployment #5663

Open dasbdavis opened 4 years ago

dasbdavis commented 4 years ago

We've got Bitbucket continuous deployment set up for a couple of our v3 Azure Functions (Azure Function App -> Platform Features -> Container settings -> CI/CD (Bitbucket)). No build provider selected, as the Bitbucket option doesn't seem to allow for it. Trigger branch is master. Function app is running from package. All are HTTP trigger functions. Nothing fancy.

Whenever we commit to master, it does indeed trigger a deployment-- which completes successfully. I can see the commit that triggered the deployment and all of that. The problem is, the function is still (usually) running old code after this deployment. I've tried restarting the function app, no success. I've tried completely stopping the app waiting for a bit and restarting, no success. Sometimes even manual deployment from Visual Studio doesn't work.

Once I remove the CI/CD pipe from Bitbucket, however, things go back to normal as far as manual deployments from VS go.

I've been able to reproduce this effect several times. Please let me know if you need any additional information.

ankitkumarr commented 4 years ago

@dasbdavis, Do you have an app setting WEBSITE_RUN_FROM_PACKAGE or similar that in your function app? (Run From Package) If so, would you mind removing that and then trying the CI/CD pipeline?

I think what's possible is that your function app is typically deployed using Run From Package, which means that the site assumes that the content is deployed at /home/data/SitePackages. But, I don't think the CI/CD webhook deployment puts the artifact there, so your site may end up using stale deployment.

If above doesn't work, would you mind sharing you function app name, and I can look to see if anything seems fishy.

dasbdavis commented 4 years ago

Sorry for the delay-- I didn't see that you'd responded. I'll try this as soon as I can and let you know.

mattmelton commented 4 years ago

I've had a similar issue with python dynamically loading old grpc protobuf files that are no longer compatible with new code.

With WEBSITE_RUN_FROM_PACKAGE=0, the deployment performs an "in-place sync". Unfortunately this has a tenancy to leave old files around, especially locked files or run-time generated files (i.e. *.pyc). In our case the old files are slurped at runtime causing version mismatch errors on method dispatch.

WEBSITE_RUN_FROM_PACKAGE=1 resolves the issue but means we can't set or rotate the host/functions keys programmatically.

I believe an A/B deployment into a separate directory, rather than in-place sync, would solve this issue.

lopezbertoni commented 4 years ago

@ankitkumarr Any insight if this was ever fixed? We're having the same issues running under Consumption Plan and Premium Plan. Our CI/CD us pushing the artifact to Azure (WEBSITE_RUN_FROM_PACKAGE set to 1) and after that we're doing an Azure Functions Restart as suggested by MSFT support but that didn't help either. Would deploying to a slot and doing a hot swap help? Please advise. If you need more data I'll be happy to provide it.

ankitkumarr commented 4 years ago

@lopezbertoni, would you mind elaborating you scenario? How are you pushing the artifact to Azure? What's the publishing process, and what issue are you seeing exactly?

lopezbertoni commented 4 years ago

@ankitkumarr

  1. Push to an Azure Function using Azure DevOps. Steps in the build pipeline are:

    • dotnet restore
    • dotnet build
    • dotnet publish
    • Publish Artifact
  2. Release the artifact with the following steps in Azure DevOps

Issue is that we deploy the Azure Function and we check the logs in Applications Insights and see that log statements that where completely removed from the code are still being executed.

We then stopped/started the Azure Function from the portal and this issue persisted. Eventually we stopped the Azure Function for around 5 mins and then started it again and the deployed code started executing fine.

This was deployed to a Premium Service Plan.

Please advise on how to fix this or if there's a workaround other than manually stopping/starting each processor every time we deploy.

ankitkumarr commented 4 years ago

@lopezbertoni, thanks for all the info. A couple more questions that'd help me narrow down the cause --

  1. What OS is your function app on? (Windows / Linux)
  2. Can you share your function app name, and a time period when you did this deployment for me to look at the logs? If you prefer to share the name privately, you can follow these steps.
lopezbertoni commented 4 years ago

@ankitkumarr

  1. Windows function app
  2. assessment-events-processor-qa deployed to Central US. This was deployed on August 5th. Here's some build information if it helps.
    {
    "version": "1.0.0.1349",
    "commitHash": "f9b666fea571120eb9c09732519acebe9e9b0deb",
    "versionDate": "2020.8.5.1",
    "branchName": "staging"
    }
ghost commented 4 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

arek-avanade commented 4 years ago

@ankitkumarr Any update on this? I experienced the same problem today. Deploying from DevOps, WEBSITE_RUN_FROM_PACKAGE=1, the function seems to be running an old code after successful deployment. There was no change in our deployment scripts recently and everything seemed to be working fine until today, although maybe the problem was there before, just unnoticed.

mattmelton commented 4 years ago

I believe my issues were due to this Kudu bug: https://github.com/projectkudu/kudu/issues/2972.

We've worked around the issues by moving to container functions. Previously I saw code that triggered "impossible" exceptions, i.e. exceptions in lines of that didn't exist in that release.

SeppeDev commented 4 years ago

Today we encoutered the same problem. No changes came through after redeployments. I restared the functionApp and such, but did not have effect (didn't wait very long, as suggested by @lopezbertoni ). We use Azure DevOps, and in the releasetask there, we had our "Deployment method" on "Auto-detect". Worked perfectly fine before, but now that I changed it explicitly to "Zip Deploy", our codechange came through. I'm not entirely sure that this is a fix for the problem, or just a coincidence, but I thought I'd share.

lopezbertoni commented 4 years ago

@SeppeDev Just to follow up / help. When we did a quick restart if didn't work. When we did a quick start/stop it didn't work. It picked up the new code once we stopped, waited for about 5 mins and started again.

SeppeDev commented 4 years ago

@lopezbertoni , ok thanks, we didn't wait for 5 minutes after stopping it, just a quick restart and a quick stop and start, so what fixed is for us probably is the change in the Release in DevOps. Thanks.

lopezbertoni commented 4 years ago

@ankitkumarr This happened again with our Production deployments from last night. Yay for no deploy Fridays 😀. We deployed 3 times and it didn't update. Eventually they did after several restarts. All of these processors where deployed to a premium plan.

ghost commented 4 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

lopezbertoni commented 4 years ago

@ankitkumarr Any update on this? Any workaround at least? We're running production with 10+ Functions and every deploy we need to stop, wait for 5 mins and start the functions to ensure the latest code is running until we know this is reliable. Would slot deployment help?

ankitkumarr commented 4 years ago

@lopezbertoni, yes apologies for the delay. I will take a look at this as soon as I can.

ghost commented 4 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

chuanqisun commented 4 years ago

We are seeing the same issue. Deployed from Azure DevOps with the built-in pipeline task. The function was stale after success deployment.

We use EP1 service plan.

As a workaround, manually stop > start it busts the old version.

We experience this issue inconsistently. The deployment is immediately effective most of the time, but there is always a chance the old version will persist. Please keep us updated on the progress. Thank you!

p.s. this is our deployment log

2020-09-16T18:45:21.1920203Z ##[section]Starting: Deploy to UAT slot
2020-09-16T18:45:21.2070301Z ==============================================================================
2020-09-16T18:45:21.2070986Z Task         : Azure Functions
2020-09-16T18:45:21.2071377Z Description  : Update a function app with .NET, Python, JavaScript, PowerShell, Java based web applications
2020-09-16T18:45:21.2071789Z Version      : 1.163.7
2020-09-16T18:45:21.2072038Z Author       : Microsoft Corporation
2020-09-16T18:45:21.2072484Z Help         : https://aka.ms/azurefunctiontroubleshooting
2020-09-16T18:45:21.2072859Z ==============================================================================
2020-09-16T18:45:23.4939394Z Got service connection details for Azure App Service:'*******************'
2020-09-16T18:45:50.5476983Z Trying to update App Service Application settings. Data: {"WEBSITE_RUN_FROM_PACKAGE":"1"}
2020-09-16T18:45:50.5480950Z Deleting App Service Application settings. Data: ["WEBSITE_RUN_FROM_ZIP"]
2020-09-16T18:45:50.7841948Z App Service Application settings are already present.
2020-09-16T18:45:55.8987426Z Package deployment using ZIP Deploy initiated.
2020-09-16T18:46:07.4602934Z Successfully deployed web package to App Service.
2020-09-16T18:46:07.4608623Z NOTE: Run From Package makes wwwroot read-only, so you will receive an error when writing files to this directory.
2020-09-16T18:46:11.5460159Z Successfully added release annotation to the Application Insight : ************
2020-09-16T18:46:11.7860628Z App Service Application URL: http://**********************.azurewebsites.net
2020-09-16T18:46:11.8404113Z ##[section]Finishing: Deploy to UAT slot
ankitkumarr commented 4 years ago

I apologize! This has been slipping my priority list. I am taking a look at this now, and will have an update this week. I know @lopezbertoni shared a function app and the rough deployment time, and I am sorry for not making time to look at it earlier.

Would @chuanqisun or @lopezbertoni be able to share a recent timeframe window that the deployment failed, and the function app name that was deployed to? I will make sure to look at it right away. In the meanwhile, I will also try to reproduce this error and scatter through old logs to see if I find what went wrong.

ankitkumarr commented 4 years ago

Adding this to Sprint 85 (current sprint) to track and investigate the issue.

lopezbertoni commented 4 years ago

@ankitkumarr Thanks for looking into this. We've been just systematically/manually stopping and starting all azure functions. One of them is assessment-events-processor-qa deployed to a premium plan.

ankitkumarr commented 4 years ago

@lopezbertoni, can you share a recent time when you deployed? I will check the logs in case it didn't auto-update and if I find some symptoms of any issue.

lopezbertoni commented 4 years ago

@ankitkumarr Latest QA release is from today (9/16 )

Some processor names: assessment-events-processor-qa person-events-processor-qa notification-events-processor-qa

ankitkumarr commented 4 years ago

We (@thaishankar and I) took some time to investigate this issue. We looked at @chuanqisun's app as it was in the weird state mentioned in the issue. I wasn't able to look at @lopezbertoni's app as the mitigation is already in place there so it's difficult to tell if the issue still occurs. It seems that there may be a platform issue such that when files are changed, a notification is not generated for the Functions host to restart. There will be a fix going out in the platform to ensure such issues are avoided, but those deployments take time and current ETA would be by end of the year.

This issue should be transient, but if you are seeing this very consistently, please do reach out, it's likely caused by something else. In my meanwhile, please mitigate by restarting the app after the deployment. Please let me know if there's concerns and if someone else is facing this issue, do post your app name and the time period when you see it. I can then verify if it's the same issue.

Thank you all for your patience!

For our reference -- internally tracked to be fixed by @thaishankar in ANT91.

ankitkumarr commented 4 years ago

I am moving it out from the Sprint, but I will leave this open and assigned to me for updates.

bjorkstromm commented 4 years ago

Happened to us several times in the past months. If I remember correctly, all affected Functions are running on EP1 plan. We are also deploying using the Azure App Service deploy task in Azure DevOps.

May I suggest removing Bitbucket from the title of the issue.

jimanttila commented 4 years ago

This just happend to me aswell. Deployed my function app using Azure CLI. First attempt syncing triggers seems to have failed. Retried a couple of hours later and deployment was successful but app is still running old code. Running on a premium plan.

thaishankar commented 4 years ago

@jimanttila Can you please share the app name and issue time?

jimanttila commented 4 years ago

@jimanttila Can you please share the app name and issue time?

App name: omnisynk-engine-prod-01

First failed attempt @ Tue Oct 20 2020 05:50:04 GMT+0200 Retried successfully @ Tue Oct 20 2020 07:31:44 GMT+0200

dustensalinas commented 3 years ago

Just had the same. App Name: ffinesse-services-aimee-dev

Had to remove Bitbucket and then perform a manual publish.

Premium Plan as well.

thaishankar commented 3 years ago

@dustensalinas and All,

We are in the process of rolling out a fix for this which should prevent this issue from occuring. The fix should be deployed to all our scale units by early to mid December.

sebb3 commented 3 years ago

@dustensalinas and All,

We are in the process of rolling out a fix for this which should prevent this issue from occuring. The fix should be deployed to all our scale units by early to mid December.

How is the roll-out progressing? We're currently not experiencing issues, but since it used to be quite random it would be great with a status update. We're hosting the apps in North Europe region..

thaishankar commented 3 years ago

@sebb3 , the fix should be in North Europe already. And it should be deployed globally by the end of this week

thaishankar commented 3 years ago

@gmlion Is the issue you are reporting for the deployment at 2021-02-10 15:00 UTC? This was the only deployment that I could see for the app KrevNotificationServer2 in the last 3 days.

From the logs, it looks like we did pick up the new zip after deployment and the function app was restarted with the new zip at 2021-02-10 15:00:54 UTC.

It is possible that the issue you are reporting is different from the one that caused the problem earlier. Our earlier fix should still be good to prevent apps from executing old code.

Would you please open a new issue with the details?

gmlion commented 3 years ago

@gmlion Is the issue you are reporting for the deployment at 2021-02-10 15:00 UTC? This was the only deployment that I could see for the app KrevNotificationServer2 in the last 3 days.

From the logs, it looks like we did pick up the new zip after deployment and the function app was restarted with the new zip at 2021-02-10 15:00:54 UTC.

It is possible that the issue you are reporting is different from the one that caused the problem earlier. Our earlier fix should still be good to prevent apps from executing old code.

Would you please open a new issue with the details?

It was an error on my side with an old deployment slot out of my radar. Sorry for the noise

RDavis3000 commented 3 years ago

This is happening for us. app-name : lxrpextranetcollaboration We've tried the 'stop, wait 5 minutes, spin around 3 times, restart app' but it didnt help

chuanqisun commented 3 years ago

@RDavis3000 This happened to our app again, about a week ago. We use slots. My previous workaround of stop and restart didn't work. I even tried deleting and recreating the slot, and that didn't work either. I feel there is some magic that recycles previously deployed slot so delete/recreate slot won't purge the stale instance.

Eventually, I found this workaround: create a new slot with a different name, deploy whatever to it, and then delete the new slot, and create the slot under the old name. This seems to purge the function app from it completely.

stormoz commented 3 years ago

This happened to us as well - app with durable functions, premium tier. Lost almost half x two developers getting to the bottom of it... Restart did not help. Only re-deployment of the same artifacts helped.

Details:

Surprised that such an important issue is not getting prioritised by Azure.

wyong95 commented 3 years ago

Having the same issue today. Any update ?

marc-perreaut commented 3 years ago

I am experiencing the same issue (old code version still running despite successful deployment) since June 4th and feel stuck:

The issue is random, but the lack (ongoing) occurrence is tough.

I am happy to get a working workaround and any update on this issue.

akakaule commented 3 years ago

I'm also experiencing the same issue. I have a service bus triggered function running .net core code where some of the invocations has executed code old code. It seems that only a few of the executions used the old code. I would really like to have an status update on this issue. The old code that are being executed is from a deployment earlier than may 3rd. So from a really old deployment. Downloading the assembly from the bin folder everything looks good.

The runtime is v3 and runs using the consumption plan (Windows). The Azure Function App task from a Azure DevOps yaml pipeline is used for deployment with deploymentMethod not specified (auto).

We have the WEBSITE_RUN_FROM_PACKAGE = 1

tippesi commented 3 years ago

We have the same problem mentioned by @akakaule . After deployment last Friday it seems like sometimes old code is being executed and sometimes the newly deployed code. Today we tried to deploy logs to find the error on our side, but the logs are only visible in some runs. In these cases the function behaves the same way it did before the deployment. We can also add that it doesn't seem to be related to an old instance which is still running. Additionally, it seems to be related only to updated functions in our function app. New functions we added always work with the new code. The old/updated ones sometimes seem to run the old code. Also only our production environment has the issue, our other environments work with the new code as intended. We also have slots for deployment in place.

tippesi commented 3 years ago

After making sure that all our slots ran the same code (by redeploying), it now works again. We assume the traffic is somehow split between both slots, even though just the production slot should have been used. In the image below is our current setup, which doesn't seem to work correctly. Of course this defeats the purpose of using a slot swap. MicrosoftTeams-image

eric-winkler commented 3 years ago

Hi Folks,

Over the past couple of weeks, I've been seeing a Queue trigger being executed intermittently by a version of my function app that was deployed sometime prior to July 02, despite there being dozens of new versions deployed to the function app since that time.

Trawling through the logs, I've identified that it is a specific host/instance that is running the old code; HostInstanceId: 49db15dc-0012-41ed-b558-4fe7aced0fdf Cloud_RoleInstance: 5AF3CA72-637562946012875197 It appears every invocation from this instance (and only this instance) is running an implementation from an old deployment

So far, the following approaches have been unsuccessful in killing this old rogue instance;

I'm using;

tomhundley commented 3 years ago

I'm having the same issue. Function app on Linux. Premium plan. Deploy from Azure DevOps. Restart. Old code runs for about 5 more minutes. New code magically starts.

Function app name: medchron-carnotaurus-eus-dev-v1-0

This is a problem. Please advise. Thanks.

simon-tarr commented 3 years ago

Also experiencing the same issue. Function app on Linux, deployed via Github Actions. Initial deploy of code worked fine. Subsequent pushes with updates to the function....function app continues to run the old code.

I've poked around the files at https://.scm.azurewebsites.net/DebugConsole and can see the new code, so our deployment from Github has clearly worked successfully (also verified by no deployment errors). Yet in the Azure portal the old code is still visible in Functions -> Function Name -> Code + Test and our web app isn't executing the new functionality which was in the most recent deployment.

This is very bizarre behaviour and needs a fix ASAP :)

mattchatterley commented 3 years ago

Also seeing this frequently (and anecdotally if we do stuff out of hours, that might just be perception). Repeatedly deploying seems to eventually solve it, but very frustrating.

SGirousse commented 3 years ago

Hello, In our project we are also facing it really frequently lately (We discovered that issue because of some incompatibility with our database updates but maybe it was already there before and simply never noticed it). Is there any roadmap on fixing that issue ?