chocolatey / choco

Chocolatey - the package manager for Windows
https://chocolatey.org
Other
10.36k stars 903 forks source link

".registry.bad" files are created for actually valid registry snapshots #1581

Closed mwallner closed 5 years ago

mwallner commented 6 years ago

I got the warning that this registry snapshot is broken:

<?xml version="1.0" encoding="utf-8"?>
<registrySnapshot xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <user>S-1-5-21-2087680783-807360899-2087665911-3127</user>
  <keys>
    <key installerType="Unknown" displayName="GPL Ghostscript" displayVersion="9.19">
      <RegistryView>Registry32</RegistryView>
      <KeyPath>HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\GPL Ghostscript 9.19</KeyPath>
      <DefaultValue />
      <InstallLocation><![CDATA[]]></InstallLocation>
      <UninstallString><![CDATA["C:\Program Files (x86)\gs\gs9.19\uninstgs.exe"]]></UninstallString>
      <HasQuietUninstall>false</HasQuietUninstall>
      <Publisher><![CDATA[Artifex Software Inc.]]></Publisher>
      <InstallDate />
      <InstallSource><![CDATA[]]></InstallSource>
      <Language />
      <Version />
      <VersionMajor />
      <VersionMinor />
      <SystemComponent>false</SystemComponent>
      <WindowsInstaller>false</WindowsInstaller>
      <NoRemove>false</NoRemove>
      <NoModify>true</NoModify>
      <NoRepair>true</NoRepair>
      <ReleaseType />
      <ParentKeyName />
    </key>
  </keys>
</registrySnapshot>

After renaming the file from .registry.bad to .registry and re-ran choco info the emmitted warning disappeared. (which would suggest the file is actually ok)

Related to #1505 and #1540

mwallner commented 6 years ago

thinking of it this might actually be related to this: #1264 (edit: to be more exact this: GH-1258)

mwallner commented 6 years ago

extends GH-1580

mwallner commented 6 years ago

until this is fixed, following snipped may help fixing those invalid created files:

gci C:\ProgramData\chocolatey\.chocolatey\ -Recurse ".registry.bad" | % { Move-Item $_.FullName $(Join-Path $_.Directory.FullName ".registry") }

choco info
bcurran3 commented 6 years ago

`I've been having problems with corrupted chocolatey.config files in the past, because choco.exe is run quite often (and hence sometimes simultaneously) in my environment

This is one of the reasons why I created chocolatey-preinstaller-checks.extension which is currently on forced hold and being needlessly debated.

bcurran3 commented 6 years ago

Just another comment, I brought up this issue and started experimenting/triaging it as well, unfortunately the conversation got no results.

https://groups.google.com/forum/#!topic/chocolatey/IaQ7_vLX-3s

The follow snippet was the weirdest of the behaviors:

I had this problem on 2 or 3 of my servers for the last month or two and finally decided to try to clean it up. Uninstalling the package would make the error go away, but reinstalling it fresh brought the error back. So I went and manually deleted the related directory and then re-installed the packages and all is well.

mwallner commented 6 years ago

@bcurran3 - I'm really thinking that there could be a synchronization issue going on here, the thing is .. I can't really get it to reproduce by 'forcing it' - with the corrupt chocolatey.config it's been easy - spin up 1000x "choco list -o" simultaneously (via PowerShell Jobs) - BOOM broken config file.

But this (forcing it into a wrong behavior) doesn't seem to be possible with the .registry.bad files...

bcurran3 commented 6 years ago

@mwallner

Luckily this problem has gone away for me. I'm not sure why. I'd like to say https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension (BTW: there's a new configurable 0.0.2 up on my GitHub repo) fixed the problem for me, but I have no proof. I do know my Chocolatey Pro license expired 05/25/2018. So that's an obvious change in my environment. I could hypothesize that it's related, but have no way to prove/test/etc without buying a new Pro license. So it's just a theory.

Out of curiosity are you running a paid version of Chocolatey? I see the Customer tag on this issue, I'm assuming you are. We might be narrowing the problem down...

EDIT: v0.0.2.1 is current: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension

mwallner commented 6 years ago

Hey @bcurran3 - I'm still experiencing this issue, but I'm currently "evading" by running the PowerShell 1-liner that I've suggested above.

Yes, I'm using Chocolatey 4 Business, so there may be a connection to that. @gep13 - I think that's actually a good point we haven't been talking about yet..

Mikaela commented 6 years ago

I am sometimes seeing this issue and getting it fixed with the oneliner (https://github.com/chocolatey/choco/issues/1581#issuecomment-392538866) and I am using the open source edition judging by the pricing page and not having paid anything.

mwallner commented 6 years ago

uh, thanks @Mikaela - than this is clearly not caused by utilizing licensed choco features.

Mikaela commented 6 years ago

You're welcome. I should probably add some additional information in case it turns out to be relevant.

I am using Windows 10 Insider Slow ring, but have seen this on family PC running stable branch.

I also have https://chocolatey.org/packages/choco-upgrade-all-at-startup installed as I don't trust my family to run Chocolatey or GUI and I guess it's possible that it is attempting to run simultaneously when I run Choco by hand as I have funny habit of checking updates with package managers as soon as I boot machines.

bcurran3 commented 6 years ago

If you're running a choco command while a choco upgrade all is going on....you're bound to have problems. It's not supported and well documented that you'll have problems. It's on the Chocolatey 1.0 milestone list to be fixed.

Make sure you read my summary of known issues at: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension/0.0.1

The first link in the thread above will default to the -pre release. The 0.0.1 release (still being ignored and in moderation purgatory) has a MUCH more detailed description of the problems and links to the GitHub choco issues.

I'd recommend you try the 0.0.2 pre-release (same but configurable) from my GitHub rep: https://github.com/bcurran3/ChocolateyPackages/blob/master/chocolatey-preinstaller-checks.extension/chocolatey-preinstaller-checks.extension.0.0.2-pre01.nupkg At the very least you can set it to WARN you when choco.exe is already running.

FYI: You'll have to manually install it. (Download .nupkg, cinst name of .nupkg)

EDIT: v0.0.2.1 is current: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension

ferventcoder commented 5 years ago

So I think I know what may be going on here, but I'm not sure and want to add some logging for 0.10.12 release to see if it brings the issue to light.

ferventcoder commented 5 years ago

So here is my theory,

There is a try that decides if the xml file should be renamed to .bad done at https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/ChocolateyPackageInformationService.cs#L98

Once it fails attempting to read the file, it renames it to .bad. This is at https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/ChocolateyPackageInformationService.cs#L101-L116

But as you dig into the code, you realize there are several factors that could cause an error with a perfectly valid .registry file. For one, it only attempts to deserialize the file one time. https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/RegistryService.cs#L390-L398

https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure/services/XmlService.cs#L51

What kinds of errors can we see from that code?

So if there is a lock on the file that doesn't let go in time, boom, renamed to bad registry file.

To start with, you can search your logs for when you get the warning about the registry and see if right before that the mutex threw a timeout. Something like: "Maximum tries of 1 reached. Throwing error" "Error deserializing response of type Registry: Timeout waiting for exclusive access to value." "A corrupt .registry file exists at ..."

I'm going to add some more logging around exactly when we decide to move the file and log the error so it's more explicit into why this is done.

Probably the right answer here is to look at using GlobalMutex with a guid per file we are accessing, so things can move faster and synchronize across faster. This is trickier as we need to be able to uniquely set a GUID per file AND the same GUID per file each time.

ferventcoder commented 5 years ago

So we've added more logging and tolerance around this for 0.10.12.

bcurran3 commented 5 years ago

If not already done, please check to see if the .registry file is already open during processing.

ferventcoder commented 5 years ago

@bcurran3 that's already done.

ferventcoder commented 5 years ago

@mwallner has this happened since we put in the fix for 0.10.12?

ferventcoder commented 5 years ago

I haven't heard any grumbles of this occurring anywhere else, so the fault tolerance change up for multiple tries probably fixed the issue. Marking as resolved for 0.10.14, but will note in release notes this actually went in at 0.10.12.

mwallner commented 4 years ago

@mwallner has this happened since we put in the fix for 0.10.12?

not seeing this issue anymore (yet we're already on 0.10.15 ...)