MicrosoftDocs / msix-docs

repository for msix content
Creative Commons Attribution 4.0 International
60 stars 115 forks source link

App Installer emitting a "Unknown error (0x80D05011)" on some machines #188

Open nbevans opened 4 years ago

nbevans commented 4 years ago

Similar to https://github.com/MicrosoftDocs/msix-docs/issues/186 - but a different error code. Also, with this error code - the workaround to use Add-AppxPackage via PowerShell does not work. So the .appinstaller cannot be installed at all on machines that exhibit this issue - we are unsure currently what to advise to the customer short of "try reinstalling your machine from scratch".

Here is a Feedback Hub link to captured diagnostic data when attempting to install via App Installer: https://aka.ms/AAa0oga

nbevans commented 4 years ago

So an update on this.

We had the customer reboot the machine. We then tried to install the app again but this time the App Installer was producing error code of 0x8000FFFF (which is the same as https://github.com/MicrosoftDocs/msix-docs/issues/186) however it had a different error text which did not say "Catastrophic error". We then tried the Add-AppxPackage workaround and hey-presto it worked, the app was successfully installed.

There are a lot of weird bugs in the App Installer architecture...

nbevans commented 4 years ago

I have now witnessed this error on my own machine and/or VM's, several times over the last week or so.

Simply rebooting the machine and trying again resolves it every single time.

image

FreddyDgh commented 3 years ago

I am seeing this error frequently as well, and have found it difficult to diagnose.

wjrivera commented 3 years ago

Any word on possible solutions for this?

samiranshah-zz commented 3 years ago

Hi @nbevans, are you still seeing this issue? Is your scenario same as #186?

samiranshah-zz commented 3 years ago

Closing this. Please reopen if you are still seeing the issue.

nbevans commented 3 years ago

@samiranshah Sorry I missed this. Yes I saw this issue ("Unknown error") again just a couple weeks ago. A reboot of the machine fixed it.

wbokkers commented 3 years ago

We saw this issue today. A reboot did not fix it, unfortunately.

yowl commented 3 years ago

@samiranshah Please reopen as we are also seeing this after changing the version number from 1.0.xx.yyy to 1.1.0.0

A reboot did fix it here.

mikehearn commented 3 years ago

I've spent a few hours debugging this. Here's what's going wrong and how to work around it.

This looks like a (very serious!!) bug in the MSIX / Delivery Optimization subsystem of Windows. It appears Microsoft already know about it and have fixed it back in May, but haven't yet released the fix, at least not to Windows 10.

@wcheng-msft @samiranshah You guys need to document the bug and publish an official workaround if you aren't going to push out the fix ASAP because this is a nightmare to figure out, yet you clearly already know what the bug is! That's really not a satisfying discovery.

The Delivery Optimization subsystem that is used by App Installer appears to be incorrectly caching the size of the file given a URL, in memory. If an .appinstaller file changes its size once published, then any machine that already downloaded it will try to download the exact same number of bytes as previously retrieved until the in memory cache is cleared with a reboot. This will result in one of two things happening:

  1. Windows requests more bytes than actually exist in the file, and then thinks - wrongly - the server doesn't support Range queries when it doesn't get enough bytes. It reports error 0x80D05011 which is "DO_E_INSUFFICIENT_RANGE_SUPPORT" as a consequence.
  2. Windows requests fewer bytes than actually exist, resulting in an attempt to parse truncated XML. This can show up in a variety of ways.

@yowl This is why you see the failure once you changed the version number. The problem is not actually the change of version number but rather that the new version quad is a different size textually to the previous.

The situation may be complicated by HTTP redirects. I saw this problem myself before ever changing anything about the XML, and I think Windows cached the wrong size from the body of a 302 redirect from HTTP to HTTPS. So watch out for that if your server generates a lot of HTML when issuing redirects.

You can see that this bug is happening by dumping the Delivery Optimization subsystem logs: Get-DeliveryOptimizationLog | Set-Content c:\dosvc.log then look at the end. If you study it very carefully you will notice that the size of the request that is being made is different to the actual size of the file on your web server.

Here's a stupid hack that seems to work:

  1. Always pad the appinstaller.xml file with white space to a fixed size large enough that you'll never exceed it. If your XML grows, it'll eat into the white space which the XML parser will ignore. If it shrinks, add more whitespace to re-pad it to the old length. I'm generating the AppInstaller .XML myself rather than using any Microsoft tool so I can do this easily - if you aren't then you may need a custom script.
  2. Always embed the version number into the MSIX file name and never re-use a version after the package was uploaded. This will ensure Windows never sees one package URL with different sizes.

(edited for clarity)

yowl commented 3 years ago

Thanks @mikehearn . Here's some powershell that I'm putting into my pipeline to do as you suggest

$size=(Get-Item "$($args[0])").length
Write-Host $size
$padSize=2000 - $size
Write-Host padding with $padSize
$padString = "".PadRight($padSize)
Add-Content "$($args[0])" $padString -NoNewline
jtorjo commented 3 years ago

@samiranshah Please reopen this. this has been happening for over 1.5 years. It's a shame the way MS has been dealing with this.

mvelayati commented 3 years ago

@mikehearn @yowl Thanks for the workaround but unfortunately it didn't work for me! I went ahead and added the powershell script to my pipeline but the issue still remains.

yowl commented 3 years ago

@mvelayati The first update may still fail as the cached xml could be, indeed would likely be, a different size, but after that you are saying it still fails?

mvelayati commented 3 years ago

@yowl Yeah I uninstalled whatever had installed on my PC, installed the app again with the new fixed size appintaller. After that I ran the pipeline again to upgrade the version number and still no luck!

yowl commented 3 years ago

It does seem to be holding up here. You've checked the size of the deployed file?

mvelayati commented 3 years ago

yeah I set the file size to be exactly 2000 bytes using the powershell script you posted above and then I FTP it over to my host.

nbevans commented 3 years ago

Thanks for this fix idea. I'm going to put it in place a few weeks in advance of our next UWP rollout. That way it should give plenty of time for machine reboots and the cache to get cleared out. So by the time our update goes out - there shouldn't be any machines holding onto the unpadded cached copy.

yowl commented 3 years ago

@mvelayati Not sure what else to suggest. Have you got redirects as @mikehearn mentioned? If you ftp the file back locally and check it, it is 2000 bytes longs?

If it starts failing here, I'll report back that the script doesn't work.

mvelayati commented 3 years ago

@yowl Sorry I'm not familiar with http redirects and whether I've got that or not but I checked the DeliveryOptimization logs and noticed that the size of the request is indeed 2000 bytes. I also checked the file size on the host which was 2000 bytes as well.

mikehearn commented 3 years ago

If the DO logs show a request for the exact same size as on the server, and it still fails, you might be facing a different issue then. What is the exact error you get?

nbevans commented 3 years ago

I just added 20kb of whitespace tail padding to our .appinstaller files. Expecting this to reliably fix these errors we were getting a lot:

The XML in the .appinstaller file is not valid ... (0xc00cee01) The XML in the .appinstaller file is not valid ... (0xc00cee23)

It's genuinely crazy that such a critical piece of Windows infrastructure contains such a catastrophic caching bug and that it has gone unfixed for so long. I imagine this bug must affect hundreds of other components aside from just App Installer.

mikehearn commented 3 years ago

I suspect App Installer is probably the only thing seriously affected. If it affected Windows Update it'd have been caught and fixed long ago, but it's obvious that Microsoft isn't actually using AppInstaller internally.

Hypothesis: Delivery Optimization originally made the assumption that all static assets it'd be asked to download were immutable. When AI integration was added this assumption was violated but nobody noticed. The symptoms probably don't appear if you test by simply incrementing a version a few times, because until you do > 9 releases the version number won't change size, and the errors it generates can vary, and it's a caching bug so it will seem to come and go more or less at random. Overall hard to track down. However from another GitHub thread Microsoft mentioned that they already fixed it back in May and were looking at maybe backporting the fix ... clearly that never happened, which is what's really crazy here. The bug is guaranteed to eventually hit everyone using AppInstaller, yet, MS don't seem to care.

jtorjo commented 3 years ago

but it's obvious that Microsoft isn't actually using AppInstaller internally.

@mikehearn Priceless, and probably true 😁

mvelayati commented 3 years ago

If the DO logs show a request for the exact same size as on the server, and it still fails, you might be facing a different issue then. What is the exact error you get?

This is the error I get when I click on the download link to install from web. image

If I launch the app from the start menu, the appinstaller checks for updates but does not find any and goes on to open the app.

wbokkers commented 3 years ago

@samiranshah Can you please reopen this issue? We still see this issue regularly.

mikehearn commented 3 years ago

@mvelayati Well, that error implies maybe you changed the size of AEther_Beta.msix? You can't do that - remember, the bug is that Delivery Optimization doesn't understand that files can change size. You should put the version number in the file name and never change an MSIX after it's been uploaded.

mvelayati commented 3 years ago

@mikehearn Yeah that was the problem. I changed the release pipeline so it appends the version number to the msix file name for each run and it works now! Thanks for the help!

dianmsft commented 3 years ago

Hi all, we are looking into this issue.

Looks like the issue has resolved itself with the following update: https://support.microsoft.com/en-us/topic/september-30-2021-kb5005611-os-builds-19041-1266-19042-1266-and-19043-1266-preview-a37f5409-f320-4175-9a66-c2682fc11c07

Can you all verify that this is the case? If not, please let us know so we can investigate.

yowl commented 3 years ago

Don't believe this is fixed as I get

Appinstaller operation failed with error code 0x80070490. Detail: Element not found.

Edition Windows 10 Pro Version 21H1 Installed on ‎20/‎10/‎2020 OS build 19043.1288 Experience Windows Feature Experience Pack 120.2212.3920.0

dianmsft commented 3 years ago

@yowl - Thanks for reporting the issue. We will need more logs to diagnose correctly. Could you please log more data by using one of the following via Feedback App if the machine is connected to the internet and uploads feedback. Please select the following Category: "Developer Platform" -> "App Deployment"

yowl commented 3 years ago

@dianmsft thanks for reviewing. I've submitted feedback through the app.

nbevans commented 3 years ago

Original thread was here: https://github.com/MicrosoftDocs/msix-docs/issues/191#issuecomment-954833601

Perhaps @ranm-msft could comment?

dianmsft commented 3 years ago

@yowl - we couldn't find your feedback. Can you please let me know the title that you used? I want to double check a couple of places. Also to make sure that it is under: "Developer Platform" -> "App Deployment"

yowl commented 3 years ago

@dianmsft The title was "App Installer does not automatically update when appinstaller changes size" When I share a link it comes with https://aka.ms/AAesqha

LevYas commented 2 years ago

I have a similar problem, submitted it via feedback hub https://aka.ms/AAetpms Hope it helps you @dianmsft

dianmsft commented 2 years ago

Thank you all for the help! I was able to find the feedback that @yowl submitted and we are looking at the logs now. I will provide more updates as they come.

yowl commented 2 years ago

It does seem to be holding up here. You've checked the size of the deployed file?

Unfortunately this doesn't seem to be the solution as we have this failing again even with padding the .appinstaller Same error it seems in the event log:

Appinstaller operation failed with error code 0x80070490. Detail: Element not found.

and

AppX Deployment operation failed for package b90d235c-ba69-42d6-8f61-7ec79035b0f2_1.1.56.0_x86__g5440y1v6mj70 with error 0x80070490. The specific error text for this failure is: Appinstaller operation failed with error code 0x80070490. Detail: Element not found.

There's no updates available, this is the result of checking, status from package.CheckUpdateAvailabilityAsync is Unknown. And this is the same when an update is made available.

image

I do have WIndows Updates pending, so will install those and report back

image

yowl commented 2 years ago

Installing the updates and rebooting made no difference: same error. Downloading the appinstaller file locally and running Add-AppxPackage -appinstallerfile .... even though there was no update fixed it and the next update did install automatically, after that however back to "Element not found". No change in version number size, and padding anyway to 2000 characters.

yowl commented 2 years ago

So maybe there is another workaround, Detect the failure in code using package.CheckUpdateAvailabilityAsync, when it errors, download the appinstaller manually and process exec powershell with Add-AppxPackage.

mikehearn commented 2 years ago

@yowl sorry to hear that. Can you attach your appinstaller.xml file in a form that preserves it bytewise (i.e. as a raw file or inside a zip)? I'd like to take a look if you don't mind, maybe there's a way to spot what's going wrong. Also your DO logs would be helpful.

yowl commented 2 years ago

@mikehearn Sorry, but I can't post the appinstaller publically. I don't see anything at all in the DO logs when starting the application or requesting an update, i.e. nothing with a recent timestamp at all using get-deliveryoptimizationlog. The last thing is from 10 minutes before starting the app:

TimeCreated : 17/11/2021 12:32:00
ProcessId   : 4792
ThreadId    : 16788
Level       : 4
LevelName   : Info
Message     : Swarm: 95D2EE60-C9D3-45E4-876D-BAE16D758A87_16_0_14527_20276_STREAM_X64_X_NONE, announce complete, event: 3, hr: 0, server: [20.54.24.169]:443 (https://array613.prod.do.dsp.mp.microsoft.com/join/), nPeers: 0, response: {"FailureReason":null,"NextJoi
              nTimeIntervalInMs":1006171,"Complete":0,"Incomplete":0,"Rediscover":false,"KVVersion":"99DCDFC4E43CE828743C620D845168FC8534879E6DCDACD58CDD25D9BBCFA430","GeoVersion":"A1B4408488D89015552A70A33860DA34F8FB14BE561530B7ADAC2F49C95A7A8B","Peers":[],"Leav
              e":false}
Function    : CAnnounceSequencer::OnComplete
LineNumber  : 187
ErrorCode   : 

Appx-DeploymentServer continues to report:

Appinstaller operation failed with error code 0x80070490. Detail: Element not found.

Clicking the appinstaller link from the web page, detects the update successfully and installs successfully. I don't suppose there's any pdb files for this code?

mikehearn commented 2 years ago

OK, no problem. The "Element not found" error is subtly different to the error codes discussed earlier in the thread. I'm wondering if the client is receiving any XML at all. The server in this case is nothing fancy, just an Apache or IIS or something? The web server logs definitely show a normal HTTP 200 response to the client?

yowl commented 2 years ago

@mikehearn I think you are right, there is no activity in NetMon when checking for an update as though the previous update has corrupted its ability to check for the next update. add-appxpackage -appinstaller https:..... is enough to allow package.CheckUpdateAvailabilityAsync to succeed again.

nbevans commented 2 years ago

image

Just had this with App Installer. Another "Catastrophic failure (0x8000ffff)"... tried everything, several reboots. In the end Add-AppxPackage was the only way to get the app to install.

Honestly thought these issues were starting to be behind us now with the newer builds of App Installer.

Nipheris commented 2 years ago

@mikehearn We had also discovered the very same bug several month ago, and it seems the participants of this github discussion are the only ones who tries to adopt App Installer in production 🤣

The situation becomes even more comic in the light of resentment of Microsoft Store by the average Windows customer: it seems we have no proven and bug-free technology to distribute and autoupdate MSIX packages...

jtorjo commented 2 years ago

@Nipheris Welcome to the wonderful world of dealing with M$. Because of this insanely stupid bug, I ended up abandoning the appinstaller, and having my users do manual updates. It's really sad how much MS really cares about this.

nbevans commented 2 years ago

participants of this github discussion are the only ones who tries to adopt App Installer in production 🤣

That's not very nice. There is no other choice than to use it.

mikehearn commented 2 years ago

From a quick look around I suspect E_UNEXPECTED ("Catastrophic Failure") is returned when an exception is thrown internally, i.e. an assertion failure or a crash of some sort. The only way to figure out what that might be is to find logs that contain more useful error messages. I don't know what logs might contain that if the DO logs don't.

mikehearn commented 2 years ago

The fact that these problems are resolvable by using PowerShell and/or specifying the AppInstaller URI to the command directly, means that the errors are not in the files themselves but rather the codepaths used for downloading them. Perhaps, the Add-AppxPackage cmdlet downloads the AppInstaller file without using Delivery Optimization, in that case.

DO is a very complicated thing. If I were the AppInstaller team I would be getting way more defensive about this subsystem, for instance by failing over to the WinHTTP stack and doing a normal download via the more mature codepaths if anything goes wrong the update checking/downloads codepath. At any rate I hope MS is paying attention here. Just constantly asking for Feedback Hub logs doesn't cut it; once you reach that point it's just way too hard to diagnose problems especially given the patchy logging that seems to exist (c.f. "catastrophic failure"). If you look at a mature and battle tested update system like Google Omaha then it has a variety of fallback codepaths that are used if anything goes wrong, because losing the ability to keep users up to date is a major problem.

One robustness move that might be worth exploring is to take update checking responsibility into the app itself. At startup it would download the .appinstaller file and parse it. If the user has fallen too far behind then it can invoke the cmdlet via PowerShell directly. If that doesn't work, it can try downloading the MSIX and passing the local file to AppInstaller instead.