Open gregpakes opened 4 years ago
Log files attached.
The hosting bundle installs both runtimes, but from the logs it appear that ANCM is failing to install.
It seems that a current version of IIS is installed: 12.2.19282.0, and it's properly detecting this and at least trying to get ready for the upgrade.
Any IIS processes running?
Based on the logs, it appears that it cached the ANCM installer here: C:\ProgramData\Package Cache{CFEA068A-DB7E-4FC4-866C-A65713072C00}v13.1.20142.0\aspnetcoremodule_x64_en_v2.msi
If it's still available on disk, you might want to try to run it manually using
msiexec /i C:\ProgramData\Package Cache\{CFEA068A-DB7E-4FC4-866C-A65713072C00}v13.1.20142.0\aspnetcoremodule_x64_en_v2.msi /l*v ancm.log
The last parameter should create a verbose log that may help to troubleshoot this further
On the servers impacted their are numerous IIS worker processes running.
Unfortunately the cached installer is no longer on disk. Running the installer again, it doesn't look like the cache folder is getting created, but this may be happening so fast that we're unable to see it happening.
Are we able to download just the aspnetcoremodule installer to obtain the logs?
Unfortunately, it's not shipped separately, but you can extract it.
dark.exe -x C:\somefolder dotnet-hosting-3.1.5-win.exe
- this will decompile the bundle and you'll be able to grab the MSI for ANCM (it should get dropped in a subfolder called AttachedContainer).@jkotalik FYI
It looks like the InstallValidate action is failing.
Action ended 14:36:53: InstallValidate. Return value 3. Action ended 14:36:53: INSTALL. Return value 3.
I also see this MSI (c) (D8:88) [14:36:45:870]: PROPERTY CHANGE: Adding MsiSystemRebootPending property. Its value is '1'.
which should normally trigger a files in use. @AdamRiddick is it possible to try rebooting and see whether that resolves the issue?
Is there a way we can proceed with the installation without handling a pending reboot?
The reason I ask is organizing reboots for 10% of our servers isn't the simplest of tasks and this will hold up the automated roll-out of updates.
We've scheduled a restart on a staging server that is experiencing this issue and I'll grab the ANCM log again once done, this won't be until tomorrow. We have tried restarting previously and were still unable to install.
Not easily. Once installers detect pending reboots, control is pretty much under the Windows Restart Manager which we can't control.
Unfortunately this failed after a restart, I've attached the new ANCM log file.
@AdamRiddick, after rebooting, is there any service that starts up that might be relying on ANCM?
On the machines where it does succeed, do you have any logs? If so, can you check what the logs says for the VersionDatabase property, it would look something like this
MSI (c) (9C:A0) [08:44:39:657]: PROPERTY CHANGE: Adding VersionDatabase property. Its value is '400'.
Are there an errors logged to the Windows Event Viewer?
I did some more digging and it seems that there are cases where the FilesInUse dialog isn't displayed:
Other than IIS, there's nothing we know of that may be relying on ANCM.
On a machine where it succeeds, the VersionDatabase entry is:
MSI (c) (3C:40) [11:11:44:794]: PROPERTY CHANGE: Adding VersionDatabase property. Its value is '400'.
No errors are logged, but I am seeing some warnings with the RestartManager - Nothing we don't see on a server where the installation succeeds;
Level | Source | Message |
---|---|---|
Information | MsiInstaller | Beginning a Windows Installer transaction: C:\ancm\aspnetcoremodule_x64_en_v2.msi. Client Process Id: 28900. |
Information | RestartManager | Starting session 0 - 2020-09-08T10:16:53.247992500Z. |
Warning * | RestartManager | Application 'C:\Windows\SysWOW64\inetsrv\w3wp.exe' (pid 49120) cannot be restarted - Application SID does not match Conductor SID.. |
Information | MsiInstaller | Ending a Windows Installer transaction: C:\DNX Logs\aspnetcoremodule_x64_en_v2.msi. Client Process Id: 28900. |
Information | Restart Manager | Ending session 0 started 2020-09-08T10:16:53.247992500Z. |
Information | MsiInstaller | Product: Microsoft ASP.NET Core Module V2 -- Installation failed. |
Information | MsiInstaller | Windows Installer installed the product. Product Name: Microsoft ASP.NET Core Module V2. Product Version: 13.1.20142.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Installation success or error status: 1603. |
* The warning is repeated per IIS Process.
@joeloff If there any further diagnostics we can do to identify the problem here?
@AdamRiddick no other thoughts, I'm adding someone from the ANCM side. Is it safe to assume that the set of machines where its failing is similarly configured?
You can temporarily skip updating ANCM by running the installer from the commandline using: <path_to_exe> OPT_NO_ANCM=1
This should at least get the runtime updated if anything is missing.
@jkotalik any ideas what might cause ANCM to fail in isolated cases.
This is the first time I've seen an issue where it is failing validation. Is there anything we can do to improve the logs from the ANCM installer to help identify the issue?
Nope. @AdamRiddick already ran the MSI manually with full verbosity. InstallValidate only checks 2 things: disk costing results and whether there are any in-use files. The latter can fail silently if processes are not associated with a window (I posted the list earlier in the thread), in which case it won't get reported properly.
@AdamRiddick if you run REG QUERY "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager" /v PendingFileRenameOperations /s
is there anything that seems related to IIS/ANCM
I'm wondering if there's a failed rename that didn't get removed and causing the problem.
@joeloff, Nothing seems to be IIS or ANCM related, there is only one operation pending. I've attached the output as you will know better than I if this is related.
@joeloff @jkotalik What route can we take with this? This is preventing us from getting updated to 3.1
@joeloff @jkotalik Sorry to chase - but this is a significant issue for us and really holds us back from using .Net Core.
Do you have any suggestions?
One idea would be to fully stop was before installing the hosting bundle.
net stop was
execute hosting bundle
net start w3svc
Besides that, I'm personally not sure.
@jkotalik That is something that we really want to avoid, it creates significant issues with the automated upgrade for us.
Would this not be considered a bug that we can raise somewhere?
@joeloff @jkotalik We've came to a position where we have had to take the approach of fully stopping WAS before installing the bundle on the server that we were using to try and resolve this issue.
This did work, and 3.1.5 is now installed, however this is a significant issue for us as we use an automated upgrade process and we can't rely on that stopping and starting the WAS without issue on so many servers.
Surely this is a bug as it is not always the case that the WAS needs stopping, this affects ~10% of our servers.
Its likely that we'll hit this same issue every time we update a patch version. We haven't yet got this rolled out everywhere and the version we are rolling out has already fallen behind the latest patch.
@AdamRiddick, I'm not very familiar with WAS, but is it possible there are lingering processes that are maybe not recycled. From the logs it's extremely likely processes are triggering potential locks/renames. If these processes are parented under WAS it would seem better to shut those down, rather than the full service. (From what I've read processes can be orphaned, e,g. to aid debugging, etc.)
I noticed in some of the events that there was a warning around the SIDs being different. I wonder if it's maybe an underlying permission issue, so when the MSI executes it, it has insufficient privileges to enable a restart. The conductor SID in this case should be the account under which the installer is running. If the event logged any detail on the actual SIDs, it might be good to see what the permission differences are.
@jkotalik There might be a way to get additional information in the MSI, but it will require additional custom action code. It won't solve the issue, but might make diagnosing it simpler. MSIs register their RM session keys at the beginning and if you execute an action before InstallValidate you can join that session and query it. However, I suspect that the RM/MSI interaction looks at the items in the File table table and adds them to the query list (judging from the event entries Adam provided previously). That information is absent in the MSI log itself, but seem to be captured in the event logs. Also, adding a custom action introduces risk since it's another potential point of failure for the install.
Hi @joeloff, thanks.
We noticed those warnings in the logs, however we also saw them occurring on servers where the .net core hosting bundle did install without any issues, though I suppose this would imply a race condition - even though the user executing the setup still does not have the required permission it doesn't matter in this situation as no locks are being held ....
We may be able to find if there is a difference in permissions - as you say it would be helpful to see that in the log detail, but our app pool identities and the user we use to perform the automatic component install are unchanging - I'll let you know if we manage to obtain this.
Thanks @AdamRiddick, it's a long shot given that it's not causing issues on the other machine. I think I forgot to ask previously, but have you tried deploying the hosting bundle with the /norestart option?
@jkotalik can you provide more details on what the CAs do when ANCM is being installed? I was looking at the InstallExecute table and noticed the following:
RestartManager executes during the InstallValidate action (sequence is 1400). All the CAs appear to run after InstallValidate. Perhaps changing the CA scheduling could improve things, but I don't know enough about the ANCM scenarios.
From digging through more RM documentation, it appears that services listed as critical will always force a reboot and cause RM to fail shutting them down. The list of critical services appear to be fixed
Hi @joeloff, we did try with /norestart but to no avail.
Ping @joeloff
@joeloff @jkotalik Any update?
There is one other option that we can maybe explore. I'll need to get a repro for this issue first. Essentially it would change the upgrade behavior of ANCM. MSIs can control when the old version of the product is removed when doing an upgrade. Removing the old MSI first and then installing the new version provides the greatest flexibility because it supports the widest set of changes between two versions of an installer. It's also slower, because it removes all the files, even ones that haven't changed and then installs them again.
The other option is to install the new version first, then remove the old copy. I'm hoping this would get around the problem. BUT, at some point you would still need to restart your applications because they'll continue to run on the existing runtime, which technically is going to get removed while they are up and that would likely cause a problem I suspect.
I have the same issue. 80070643 is not the real error. Maybe it means the file was in used.
Action ended 14:36:53: InstallValidate. Return value 3.
Action ended 14:36:53: INSTALL. Return value 3.
Stop the IIS and uninstall the old version , then install the new version. Fix it.
@willshao the 1603 is just a generic error. As you pointed out, the failure is in the InstallValidate action and this is exactly when locked files are evaluated.
Install does not complete, update has installed a dark theme There is a square in the middle of the install dialog. click on it (hides check box) then the install button becomes active! Press button and problem solved
Is there any solution? Have the same problem installing .Net 8 on application server. Same error, unable to install ANCM.
This is a x-post from here as we aren't getting any traction. https://github.com/dotnet/sdk/issues/12920
We've had issues upgrading the Web Hosting Bundle before with versions 2.2.4 and 2.2.8 (see: dotnet/runtime#1803) reporting a general failure (error 1603).
Each time we update the hosting bundle we're seeing the same set of issues, which is preventing us from being able to update patch versions more frequently. We are currently trying to patch to 3.1.5.20271.
We are managing approximately 300 servers and we see this failure on about 10% of them.
Steps to reproduce Install the .Net Core Hosting Bundle - We've been unable to reliably reproduce, but have seen this across a handful of servers - Typically those are running multiple IIS websites.
We're attempting to install this via powershell in the first instance, then trying to manually resolve - neither work, but do on 95% of our servers.
Expected Behaviour The .net core hosting bundle installs.
Actual behavior The .Net core hosting bundle does not install.
Its worth noting the runtime and shared framework are installed but Windows Server Hosting fails.