Closed mwallner closed 5 years ago
thinking of it this might actually be related to this: #1264 (edit: to be more exact this: GH-1258)
chocolatey.config
files in the past, because choco.exe is run quite often (and hence sometimes simultaneously) in my environmentextends GH-1580
until this is fixed, following snipped may help fixing those invalid created files:
gci C:\ProgramData\chocolatey\.chocolatey\ -Recurse ".registry.bad" | % { Move-Item $_.FullName $(Join-Path $_.Directory.FullName ".registry") }
choco info
`I've been having problems with corrupted chocolatey.config files in the past, because choco.exe is run quite often (and hence sometimes simultaneously) in my environment
This is one of the reasons why I created chocolatey-preinstaller-checks.extension which is currently on forced hold and being needlessly debated.
Just another comment, I brought up this issue and started experimenting/triaging it as well, unfortunately the conversation got no results.
https://groups.google.com/forum/#!topic/chocolatey/IaQ7_vLX-3s
The follow snippet was the weirdest of the behaviors:
I had this problem on 2 or 3 of my servers for the last month or two and finally decided to try to clean it up. Uninstalling the package would make the error go away, but reinstalling it fresh brought the error back. So I went and manually deleted the related directory and then re-installed the packages and all is well.
@bcurran3 - I'm really thinking that there could be a synchronization issue going on here, the thing is .. I can't really get it to reproduce by 'forcing it' - with the corrupt chocolatey.config it's been easy - spin up 1000x "choco list -o" simultaneously (via PowerShell Jobs) - BOOM broken config file.
But this (forcing it into a wrong behavior) doesn't seem to be possible with the .registry.bad files...
@mwallner
Luckily this problem has gone away for me. I'm not sure why. I'd like to say https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension (BTW: there's a new configurable 0.0.2 up on my GitHub repo) fixed the problem for me, but I have no proof. I do know my Chocolatey Pro license expired 05/25/2018. So that's an obvious change in my environment. I could hypothesize that it's related, but have no way to prove/test/etc without buying a new Pro license. So it's just a theory.
Out of curiosity are you running a paid version of Chocolatey? I see the Customer tag on this issue, I'm assuming you are. We might be narrowing the problem down...
EDIT: v0.0.2.1 is current: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension
Hey @bcurran3 - I'm still experiencing this issue, but I'm currently "evading" by running the PowerShell 1-liner that I've suggested above.
Yes, I'm using Chocolatey 4 Business, so there may be a connection to that. @gep13 - I think that's actually a good point we haven't been talking about yet..
I am sometimes seeing this issue and getting it fixed with the oneliner (https://github.com/chocolatey/choco/issues/1581#issuecomment-392538866) and I am using the open source edition judging by the pricing page and not having paid anything.
uh, thanks @Mikaela - than this is clearly not caused by utilizing licensed choco features.
You're welcome. I should probably add some additional information in case it turns out to be relevant.
I am using Windows 10 Insider Slow ring, but have seen this on family PC running stable branch.
I also have https://chocolatey.org/packages/choco-upgrade-all-at-startup installed as I don't trust my family to run Chocolatey or GUI and I guess it's possible that it is attempting to run simultaneously when I run Choco by hand as I have funny habit of checking updates with package managers as soon as I boot machines.
If you're running a choco command while a choco upgrade all is going on....you're bound to have problems. It's not supported and well documented that you'll have problems. It's on the Chocolatey 1.0 milestone list to be fixed.
Make sure you read my summary of known issues at: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension/0.0.1
The first link in the thread above will default to the -pre release. The 0.0.1 release (still being ignored and in moderation purgatory) has a MUCH more detailed description of the problems and links to the GitHub choco issues.
I'd recommend you try the 0.0.2 pre-release (same but configurable) from my GitHub rep: https://github.com/bcurran3/ChocolateyPackages/blob/master/chocolatey-preinstaller-checks.extension/chocolatey-preinstaller-checks.extension.0.0.2-pre01.nupkg At the very least you can set it to WARN you when choco.exe is already running.
FYI: You'll have to manually install it. (Download .nupkg, cinst name of .nupkg)
EDIT: v0.0.2.1 is current: https://chocolatey.org/packages/chocolatey-preinstaller-checks.extension
So I think I know what may be going on here, but I'm not sure and want to add some logging for 0.10.12 release to see if it brings the issue to light.
So here is my theory,
There is a try that decides if the xml file should be renamed to .bad done at https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/ChocolateyPackageInformationService.cs#L98
Once it fails attempting to read the file, it renames it to .bad. This is at https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/ChocolateyPackageInformationService.cs#L101-L116
But as you dig into the code, you realize there are several factors that could cause an error with a perfectly valid .registry file. For one, it only attempts to deserialize the file one time. https://github.com/chocolatey/choco/blob/4db2e0439176c9b733ed894879e03b2f741119e2/src/chocolatey/infrastructure.app/services/RegistryService.cs#L390-L398
What kinds of errors can we see from that code?
So if there is a lock on the file that doesn't let go in time, boom, renamed to bad registry file.
To start with, you can search your logs for when you get the warning about the registry and see if right before that the mutex threw a timeout. Something like: "Maximum tries of 1 reached. Throwing error" "Error deserializing response of type Registry: Timeout waiting for exclusive access to value." "A corrupt .registry file exists at ..."
I'm going to add some more logging around exactly when we decide to move the file and log the error so it's more explicit into why this is done.
Probably the right answer here is to look at using GlobalMutex with a guid per file we are accessing, so things can move faster and synchronize across faster. This is trickier as we need to be able to uniquely set a GUID per file AND the same GUID per file each time.
So we've added more logging and tolerance around this for 0.10.12.
If not already done, please check to see if the .registry file is already open during processing.
@bcurran3 that's already done.
@mwallner has this happened since we put in the fix for 0.10.12?
I haven't heard any grumbles of this occurring anywhere else, so the fault tolerance change up for multiple tries probably fixed the issue. Marking as resolved for 0.10.14, but will note in release notes this actually went in at 0.10.12.
@mwallner has this happened since we put in the fix for 0.10.12?
not seeing this issue anymore (yet we're already on 0.10.15 ...)
I got the warning that this registry snapshot is broken:
After renaming the file from
.registry.bad
to.registry
and re-ranchoco info
the emmitted warning disappeared. (which would suggest the file is actually ok)Related to #1505 and #1540