Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 441 forks source link

[Linux Consumption] Successful slot swaps automatically reverted after a few minutes #7336

Open andrewconnell opened 3 years ago

andrewconnell commented 3 years ago

I know I've listed n/a a few times in this issue, but that's because this has nothing to do with the code or a specific invocation of the function. This happens with the default Node.js function app project created in VS Code... looks more like a runtime/host issue than a code issue.

Investigative information

Please provide the following:

Repro steps

Provide the steps required to reproduce the problem:

  1. Create new Azure Function Linux Consumption app service

  2. Create a 2nd deployment slot staging

  3. Deploy app to staging slot

    test function... this works

  4. Swap slots & wait for the process to complete

  5. Observe the functions are now listed under the production slot:

    image

  6. Wait a few minutes (~5-10... usually I see this around 9m after the successful slot deployment)

  7. Retry the function on the production slot = failure

  8. Observe the functions are no longer listed on the production slot:

    image

  9. go back and list functions on the staging slot... observe a phantom swap has occurred

Expected behavior

Expect that after a successful slot swap, the swap isn't reverted.

Actual behavior

See repro steps.

Known workarounds

Don't use Linux Consumption plan... if I create an Windows Consumption plan, it works as expected.

Related information

For the Q/A on this issue that led me here to post a bug, please see the following. I've included all details in this issue, but including this to show others have had the same experience. https://docs.microsoft.com/en-us/answers/questions/382139/az-function-consumption-plan-reverts-production-sl.html

balag0 commented 3 years ago

@andrewconnell I tried this but couldn't repro it myself. Could you provide more details on how you are deploying to the staging slot in step 3?

andrewconnell commented 3 years ago

@balag0 I'm deploying to staging via a GitHub action:

      ######################################################################
      # login to Azure CLI via service principal
      ######################################################################
      - name: Login to Azure
        run: az login --service-principal --tenant $BOT_TENANT_ID --username $BOT_CLIENT_ID --password $BOT_CLIENT_SECRET
        env:
          BOT_TENANT_ID: ${{ secrets.TOTALVIEW_BOT_AZURE_TENANTID }}
          BOT_CLIENT_ID: ${{ secrets.TOTALVIEW_BOT_AZURE_CLIENTID }}
          BOT_CLIENT_SECRET: ${{ secrets.TOTALVIEW_BOT_AZURE_CLIENTSECRET }}

      ######################################################################
      # acquire publish profile for Azure Functions App
      ######################################################################
      - name: Download Azure Function app publishing profile
        id: az_funcapp_publishing_profile
        run: |
          CMD_PUB_PROFILE=$(az functionapp deployment list-publishing-profiles --subscription $AZURE_SUBSCRIPTION_ID --resource-group $FUNCTION_APP_RESOURCE_GROUP --name $FUNCTION_APP_NAME --slot $FUNCTION_APP_DEPLOYMENT_SLOT --xml)
          echo "::set-output name=slot_pub_profile::${CMD_PUB_PROFILE}"
        env:
          AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_FUNCTIONAPP_SUBSCRIPTIONID }}
          FUNCTION_APP_RESOURCE_GROUP: ${{ secrets.AZURE_FUNCTIONAPP_RESOURCEGROUP }}
          FUNCTION_APP_NAME: ${{ secrets.AZURE_FUNCTIONAPP_NAME }}
          FUNCTION_APP_DEPLOYMENT_SLOT: ${{ env.AZURE_FUNCTION_APP_DEPLOYMENT_SLOT }}

      ######################################################################
      # deploy function app
      ######################################################################
      - name: Deploy Azure Functions app
        uses: Azure/functions-action@v1
        with:
          app-name: ${{ secrets.AZURE_FUNCTIONAPP_NAME }}
          package: '.'
          publish-profile: ${{ steps.az_funcapp_publishing_profile.outputs.slot_pub_profile }}
          respect-funcignore: true

      ######################################################################
      # update azure function app setting to commit hash
      ######################################################################
      - name: Set Azure Function app's app setting "APP_VERSION" & "COMMIT_HASH"
        run: |
          az functionapp config appsettings set --resource-group $FUNCTION_APP_RESOURCE_GROUP --name $FUNCTION_APP_NAME --slot $FUNCTION_APP_DEPLOYMENT_SLOT --slot-settings "APP_VERSION=$SLOT_SETTING__APP_VERSION" "COMMIT_HASH=$SLOT_SETTING__COMMIT_HASH"
        env:
          FUNCTION_APP_RESOURCE_GROUP: ${{ secrets.AZURE_FUNCTIONAPP_RESOURCEGROUP }}
          FUNCTION_APP_NAME: ${{ secrets.AZURE_FUNCTIONAPP_NAME }}
          FUNCTION_APP_DEPLOYMENT_SLOT: ${{ env.AZURE_FUNCTION_APP_DEPLOYMENT_SLOT }}
          SLOT_SETTING__APP_VERSION: staging
          SLOT_SETTING__COMMIT_HASH: ${{ github.sha }}

Then, when I want to swap the slots, in another workflow I do the following:

      ######################################################################
      # login to Azure CLI via service principal
      ######################################################################
      - name: Login to Azure
        run: az login --service-principal --tenant $BOT_TENANT_ID --username $BOT_CLIENT_ID --password $BOT_CLIENT_SECRET
        env:
          BOT_TENANT_ID: ${{ secrets.TOTALVIEW_BOT_AZURE_TENANTID }}
          BOT_CLIENT_ID: ${{ secrets.TOTALVIEW_BOT_AZURE_CLIENTID }}
          BOT_CLIENT_SECRET: ${{ secrets.TOTALVIEW_BOT_AZURE_CLIENTSECRET }}

      ######################################################################
      # swap deployment slots
      ######################################################################
      - name: Swap staging & production deployment slot
        run: |
          az functionapp deployment slot swap --resource-group $FUNCTION_APP_RESOURCE_GROUP --name $FUNCTION_APP_NAME --slot $FUNCTION_APP_STAGING_SLOT --target-slot $FUNCTION_APP_PRODUCTION_SLOT
        env:
          FUNCTION_APP_RESOURCE_GROUP: ${{ secrets.AZURE_FUNCTIONAPP_RESOURCEGROUP }}
          FUNCTION_APP_NAME: ${{ secrets.AZURE_FUNCTIONAPP_NAME }}
          FUNCTION_APP_STAGING_SLOT: ${{ env.AZURE_FUNCTION_APP_STAGING_DEPLOYMENT_SLOT }}
          FUNCTION_APP_PRODUCTION_SLOT: production

BTW, these exact same workflows work when I target a Windows based Consumption AzFunction app. I used the same repo in my test and only changed the name of the function app I'm targeting via the secret ${{ secrets.AZURE_FUNCTIONAPP_NAME }}

astegmaier commented 3 years ago

FWIW, I can repro this issue as well: Here's what I'm doing:

When I try out the test function for the production slot, it works at first. But after a period of time (which varies from almost immediately to as much as 10-20 minutes later) I get 404 errors, and my test function in the azure portal disappears from the production slot. The function is still in the "staging" slot as if the swap never occurred. I don't see anything in Deployment Slots > Logs that mentions an error. @balag0 - if there is some way that I could query for more detailed logs to help with debugging I'm happy to help - just let me know what you'd like me to do.

When I try the same thing with a windows consumption plan, everything works as expected.

balag0 commented 3 years ago

Thanks for the repro steps and the additional details everyone. We understand the bug and working on a fix now. Will update this thread with an ETA soon.

In the meantime, I think this bug should only impact apps using remote build (which is what github actions is using), so publishing using functions core tools could be a workaround.

balag0 commented 3 years ago

Update: Fix will rollout with the next release. Expecting this to be available in all regions in ~45 days. Thanks

v-bbalaiagar commented 3 years ago

Hi @balag0, Can you please suggest if the label needs to be changed.

phawxby commented 3 years ago

@balag0 could whatever has been fixed also apply to Azure App Service too? We've had a few issues recently where slot settings appear to be randomly getting confused between RC and Production tiers. RC setting end up in prod, or prod settings end up in RC despite in both cases all the slot setting values being set. You can see the discussion about the problem on ticket support ticket 2105140050002362

balag0 commented 3 years ago

The issue (and fix) was specific to consumption plans only. So it must be a different issue.