Azure / static-web-apps

Azure Static Web Apps. For bugs and feature requests, please create an issue in this repo. For community discussions, latest updates, kindly refer to the Discussions Tab. To know what's new in Static Web Apps, visit https://aka.ms/swa/ThisMonth
https://aka.ms/swa
MIT License
329 stars 56 forks source link

SWA - consistent response payload #1281

Closed michaelbinks closed 1 year ago

michaelbinks commented 1 year ago

Describe the bug

We host a vue.js applcation on SWA. We had a recent issue where a 3rd party script rendered the application useless. JS error and resulting whitescreen. Once we found the issue we deployed a new version (removing the offending script tag) of the app to SWA. This was ~21:30 15/09/2023 BST. However post 23:00 on the same date we were still seeing some requests to index.html on the SWA returning the script tag as part of the response payload. Sadly this meant that whist we had fixed the issue the SWA was still serving a version that was breaking the site for the customer.

Are we able to get any clarification on the reasonings for this type of behaviour.

Expected behavior Following a deployment of a static resource the expected payload will be returned consistently. I understand this is a global service so looking for an understanding of how long it could take to achieve this. MS docs are not helping

Screenshots If applicable, add screenshots to help explain your problem.

Device info (if applicable): all/any

Additional context Add any other context about the problem here.

akselikap commented 1 year ago

We are facing a similar issue which we received a report for at 9.00 UTC time today. I have created a support ticket to Azure about this. Our issue is as follows: SWA returns index.html. Index.html starts with our React JavaScript bundle script tags. We have 2 bundle files. So from the network tab it looks like:

  1. index.html
  2. bundle1-hash.js
  3. bundle2-hash.js

Starting today at 9.00 UTC time the bundle2-hash.js will randomly return 404 error. E.g. if I spam the reload button of a browser on our site 10 times within 10 seconds it could return 5 404 errors and 5 200 successes. All 404 errors lead to our React application crashing and displaying a blank white screen.

In addition to this it's also clear that the SWA could randomly return a different index.html file which most likely belonged to a previous deployment of the SWA. This can also be observed from the network tab because the hashes of the included script tags change randomly when you spam the reload button of the browser.

We have 4 SWAs which are built from the same code and deployed using the same pipeline. 3 of them are experiencing the issue. The only difference I could think of between the applications is that the other 3 are created earlier compared to the one that is still working.

akselikap commented 1 year ago

I got confirmation from Azure that the fault is indeed on their end. Haven't gotten a confirmation that it has been fixed but it seems like it's fixed now.

jgarplind commented 1 year ago

Haven't gotten a confirmation that it has been fixed but it seems like it's fixed now.

Personally I am still seeing the issue intermittently, so I don't think there has been a fully rolled out fix yet.

serhatkepez commented 1 year ago

I got confirmation from Azure that the fault is indeed on their end. Haven't gotten a confirmation that it has been fixed but it seems like it's fixed now.

Unfortunately still having this issue

lud-hu commented 1 year ago

We are also experiencing these issues since yesterday and it's making our site not usable at all. I hope they will come up with a fix for this quickly...

ghost commented 1 year ago

Same problem here. Trying to change engine, bundle size, staticwebappconfg, nothing is working.

vegaasen commented 1 year ago

Same here at our end, seems to be totally random in what is being served - both in terms of "wrong files"/old revisions, and 404s.

MarkTallentire commented 1 year ago

We've been facing issues with atomic deployments (or lack of!) for the last 3 days.

Refreshing our site serves different versions randomly and sometimes produces 404s image

This is causing issues with our production environments.

bengreenow commented 1 year ago

Same issue here, seems almost round robin-y

Reload: Works! Reload: Works! Reload: 404 hell... Reload: Works!

LaurenceNairne commented 1 year ago

I got confirmation from Azure that the fault is indeed on their end. Haven't gotten a confirmation that it has been fixed but it seems like it's fixed now.

@akselikap, any chance you have more to share on their response to this? As there seems to be no public acknowledgement of an issue. This is still impacting us and is blocking our deployments (so we don't screw production) at the moment.

akselikap commented 1 year ago

Just got a response from Azure:

The investigation is still ongoing.

The Product group team provided the following mitigation steps:

Perform a redeployment without any user interruption (like cancel, redeploy, multi deploy ...).

After the redeploy, the issue should be mitigated. Once you’re able to redeploy, let us know, and we’ll monitor to check if any 404 will still be appearing.

I haven't tested these instructions myself because our system appears to be working already.

MarkTallentire commented 1 year ago

Just got a response from Azure:

The investigation is still ongoing. The Product group team provided the following mitigation steps: Perform a redeployment without any user interruption (like cancel, redeploy, multi deploy ...). After the redeploy, the issue should be mitigated. Once you’re able to redeploy, let us know, and we’ll monitor to check if any 404 will still be appearing.

I haven't tested these instructions myself because our system appears to be working already.

This didn't resolve anything for us. Deployed via Azure Devops Pipelines. Still random 404s on refresh.

akselikap commented 1 year ago

Is there any way to actually see if the error is still occurring? Does something like access log exist for Static Web Apps? I couldn't come up with one myself.

MarkTallentire commented 1 year ago

Is there any way to actually see if the error is still occurring? Does something like access log exist for Static Web Apps? I couldn't come up with one myself.

No, seems to all be hidden away. We can replicate pretty easily by just hitting F5 a few times now.

akselikap commented 1 year ago

No, seems to all be hidden away. We can replicate pretty easily by just hitting F5 a few times now.

I'm unable to replicate it that way so maybe it's fixed or then it doesn't occur as frequently anymore. Kind of annoying that there's no way for me to monitor my own application. At least you should be able to just connect it to Application Insights and have the request logs there or some metric about HTTP codes etc.

LaurenceNairne commented 1 year ago

Just got a response from Azure:

The investigation is still ongoing. The Product group team provided the following mitigation steps: Perform a redeployment without any user interruption (like cancel, redeploy, multi deploy ...). After the redeploy, the issue should be mitigated. Once you’re able to redeploy, let us know, and we’ll monitor to check if any 404 will still be appearing.

I haven't tested these instructions myself because our system appears to be working already.

This didn't resolve anything for us. Deployed via Azure Devops Pipelines. Still random 404s on refresh.

Same for us

MarkTallentire commented 1 year ago

No, seems to all be hidden away. We can replicate pretty easily by just hitting F5 a few times now.

I'm unable to replicate it that way so maybe it's fixed or then it doesn't occur as frequently anymore. Kind of annoying that there's no way for me to monitor my own application. At least you should be able to just connect it to Application Insights and have the request logs there or some metric about HTTP codes etc.

We're now seeing it considerably less often, I'm redeploying to our test systems to see if it re-occurs after a deployment and raising with our CSP. Will keep you posted.

nicktolhurst commented 1 year ago

Kind of annoying that there's no way for me to monitor my own application.

Agreed. There is a workaround to provide deeper analytics for static web apps. You would have to create a standalone application insights resource and then hook it up with this snippet: https://github.com/Microsoft/ApplicationInsights-JS#snippet-setup-ignore-if-using-npm-setup. (Not very useful for capturing 404s though..)

However, I am still seeing this issue often. Around 1 in every 10 requests returns 404.

MarkTallentire commented 1 year ago

I am still seeing this issue fairly often. Around 1 in every 10 requests returns 404.

Following a re-deployment to our test system the issue became much more apparent again. I'm assuming its some kind of propagation issue to various edge endpoints and the 404 occurs when you hit one that hasn't started running the new container yet.

nikoraes commented 1 year ago

I am still seeing this issue fairly often. Around 1 in every 10 requests returns 404.

Following a re-deployment to our test system the issue became much more apparent again. I'm assuming its some kind of propagation issue to various edge endpoints and the 404 occurs when you hit one that hasn't started running the new container yet.

I agree. We are having the same issue since last week. After deployment we get 404s on random files but after about 6 hours everything seems to work again. So redeploying definitely doesn't solve anything as it then takes seems to take longer for everything to work again.

MarkTallentire commented 1 year ago

Just to confirm we've raised with our CSP who have raised with MS. Will update if anything new comes back

Duske commented 1 year ago

Experienced the same issue since 15-09-2023 together with other users: https://learn.microsoft.com/en-us/answers/questions/1369563/404-errors-in-azure-static-web-app-js?comment=question#newest-question-comment

Happens in both tiers, Free and with SLA

AjayKumar-MSFT commented 1 year ago

Duske, Thanks for adding the Q&A thread.

I understand that this is causing inconvenience. I am currently collaborating internally with our product engineering team, and I will provide an update as soon as more info becomes available. We appreciate your patience!

thomasgauvin commented 1 year ago

We're mitigating and investigating the cause, will provide updates when we can!

annikel commented 1 year ago

If you've been affected by this issue, we kindly request that you provide us with additional details about your static web apps, such as the default hostname. Sharing this information will assist and expedite our investigation. If you prefer not to share this information here, please open a support ticket.

jgarplind commented 1 year ago

If you've been affected by this issue, we kindly request that you provide us with additional details about your static web apps, such as the default hostname. Sharing this information will assist and expedite our investigation. If you prefer not to share this information here, please open a support ticket.

There is a support ticket with the tracking ID 2309190050003017 for your reference

Details about webapp:

Duske commented 1 year ago

If you've been affected by this issue, we kindly request that you provide us with additional details about your static web apps, such as the default hostname. Sharing this information will assist and expedite our investigation. If you prefer not to share this information here, please open a support ticket.

Support ticket for your reference: 2309200050000814 Details about webapp:


Edited: Problems started at 15.09.2023 around 16:00 CEST (that's when we deployed)

bengreenow commented 1 year ago

If you've been affected by this issue, we kindly request that you provide us with additional details about your static web apps, such as the default hostname. Sharing this information will assist and expedite our investigation. If you prefer not to share this information here, please open a support ticket.

Support ticket for your reference: 2309200050000814 Details about webapp:

  • built with vue.js and bundled with Vite
  • SLA tier and free tier affected
  • the longer deployment is active, the less 404 you seem to get (CDN propagation issue?)
  • no API, functions, database involved. A very plain SWA
  • deployed via github action Azure/static-web-apps-deploy@v1

Our situation is identical, however we deploy from Azure Pipelines via DevOps

vegaasen commented 1 year ago

Lots of problems here too. Web-app structure:

Needless to say, but this is quite urgent. Luckily we have measurements in place for our staging/QA environment, and the bug was catched there before going live to all customers. This basically, right now, means that we cannot deploy new features or bug-fixes. If this persists further, we need to take action and move away from SWA, permanently, as this isn't viable 💯

MarkTallentire commented 1 year ago

Lots of problems here too. Web-app structure:

* Simple/Plain React/Typescript-application packaged using Vite - plain SWA

* No APIs

* No functions

* Built through Azure DevOps

* Deployed through Azure DevOps using the following configuration:

  * ```yaml
          - task: AzureStaticWebApp@0
            displayName: Upload to Azure Static Webapp
            inputs:
              app_location: '$(Build.ArtifactsDirectory)/build'
              output_location: ''
              skip_app_build: true
              azure_static_web_apps_api_token: $(<masked>)
    ```

* **SLA**-tier and **Free**-tier affected

* `404`s experienced when refreshing the SWA domain. Looks to be refrerring to old versions of the, as the cache-busting identifiers we have in place is referencing to old elements. As mentioned above here, it looks to sort of be related to CDN or something similar? We've noticed:

  * Seen two-three old releases
  * Deployed eight times, and seen atleast seven of those eight whist refreshing the site 😅

* We're not quite sure, but 15th of september seems to be the date for when the problems started for us too

This is identical to our setup. We have a ticket raised through our CSP already (unfortunately they haven't provided us a reference number)

akselikap commented 1 year ago

I received a confirmation from Azure that this issue should be mitigated now. I also got a very brief description about the root cause.

the 404 logs originate from the Frontend (Azure - Inside the Azure App Service Architecture | Microsoft Learn) — a layer seven load balancer in the App Service Architecture.

MarkTallentire commented 1 year ago

I received a confirmation from Azure that this issue should be mitigated now. I also got a very brief description about the root cause.

the 404 logs originate from the Frontend (Azure - Inside the Azure App Service Architecture | Microsoft Learn) — a layer seven load balancer in the App Service Architecture.

Thanks for the update. Everything appears to be working and stable on our side as well. Disappointed in the response by Microsoft (Zero updates on official channels, issue hasn't even been noted on Service Health etc.. and not sure why it takes a member of the community to advise everyone its fixed instead of them) but that's a different story.

jgarplind commented 1 year ago

Also unable to reproduce the error straight away, but being intermittent in nature, it is hard to say if something is fully resolved.

dagskage commented 1 year ago

@thomasgauvin / @AjayKumar-MSFT - can you confirm that the issue is fully resolved?

annikel commented 1 year ago

I am confirming that the issue has been successfully resolved as of yesterday. Thank you for bringing this matter to our attention, and we appreciate your patience throughout the resolution process.

thomasgauvin commented 1 year ago

As confirmed by @annikel, this issue is now resolved. Thanks everyone for the feedback and the details that helped us identify the issue and fix it

ghost commented 1 year ago

Hi @thomasgauvin

It seems not completely resolved.

ghost commented 1 year ago

Nevermind, seems linked to : #1230