WinRb / vagrant-windows

Other
445 stars 83 forks source link

When using chef solo provisioning, chef-client doesn't run as administrator unless UAC is disabled on the guest #34

Closed kashook closed 10 years ago

kashook commented 11 years ago

I've been playing around with the latest vagrant-windows code. A recent change pulled in by this pull request seems to have broken some of our provisioning recipes. The user that submitted the request describes what he did on this opscode ticket and on this stackexchange post. In the opscode ticket comments, he mentions that UAC needs to be off, and my testing seems to confirm that this is the case. The chef-client is now being executed by running the ps_runas.ps1 script. If UAC is enabled, then it seems that chef-client does not run as an administrator. Most provisioning recipes need administrator access, so now they will not work unless UAC is off. Requiring that UAC be disabled doesn't really seem ideal since it will be enabled in production environments and vagrant is often used to create test environments that mirror production. Rather than changing vagrant-windows to work around issues with the SQL Server installer not working via WinRM, I think the task-scheduler trick should have been used in the SQL Server recipe as was suggested by Adam Edwards in the comments on the above chef issue. We actually have used this trick ourselves to work around auth issues we were running into when installing the .NET 4.0 framework. We didn't run chef client using a scheduled task (as suggested by Adam Edwards), but instead we had our recipe run the installer by scheduling a task and then running the task immediately. I believe the same trick would work around his SQL Server problem without the need for the ps_runas script in vagrant-windows.

sneal commented 11 years ago

The prior Vagrant 1.0 compatible release said to disable UAC in the readme, but I understand why you'd like to see UAC support as it would more accurately represent a production system. My preference is to enhance vagrant-windows to keep the same behavior as a local chef run to avoid embedding workarounds into cookbooks. This ensures cookbooks developed outside vagrant-windows will work properly.

I think the real question is why does enabling UAC cause the chef client to run as a non-admin? Vagrant-windows uses the win32 CreateProcessAsUser function which I would imagine the task scheduler does as well. I see several potential solutions to this problem.

  1. Enhance the existing ps_runas script to provide the proper meta-data to UAC to force processes to run as admin.
  2. Provide a vagrant-windows configuration option to skip using the ps_runas script.
  3. Replace the ps_runas script with a scheduled task. Vagrant-windows can poll until its complete and then load the chef run from the log file.
kashook commented 11 years ago

Hi sneal, I was hoping you'd notice this issue. :)

I didn't notice before that the readme for the Vagrant 1.0 compatible release said to turn off UAC, but I'm glad you see why I might not want to turn it off.

I think any of those solutions would solve my issue. If there is some way to get the ps_runas script to get processes to actually run as administrator when UAC is on, then I agree that would be ideal. Running commands in the task scheduler also seems like a good option. The "skip ps_runas" option would also work. The only slight issue I see with that is that prior to this change, chef-client was being executed as an admin by default (when UAC is on in the guest). After the change, it doesn't execute as an admin unless you set the "skip ps_runas" option.

I mentioned in my original post that I ran into "unexplained" auth issues when attempting to install .NET 4.0 via a chef recipe. This was not using vagrant, but just plain chef. I ended up finding out that chef actually had nothing to do with it. When I would directly run the .NET 4 installer via WinRM, I'd hit the problem. (This sounds pretty similar to your SQL Server situation I think, but correct me if I misunderstood what you were running into). I'll skip all the details of the investigating I did, but I became convinced in my case the real problem is an issue between WinRM and the .NET 4 installer (though I don't know what it is). My guess is that in your case, the root of the original problem that prompted the ps_runas change is an issue between WinRM and the SQL Server installer. In the end, I modified my recipe to run the install via the task scheduler. The recipe now works whether chef-client is executed via WinRM or whether it is executed directly on the server. We put the work-around in the recipe because we couldn't necessarily control whether chef-client was going to be executed via WinRM or not in our scenario. I'm not sure in what circumstances your SQL Server recipe is used. However, if it might be used outside of vagrant provisioning, it seems like you would potentially have the same concern.

sneal commented 11 years ago

It sounds like you encountered the same exact problem with the .NET 4 installation cookbook as I did with the SQL Server cookbook. Without the fresh credentials provided via the ps_runas script spawning a new process, the WinRM credentials don't support propagation to network devices or secondary processes.

My preference is to try and modify the ps_runas script to work correctly with UAC, although it looks like there is something special about built-in administrator accounts.

BTW - Which .NET cookbook are you using?

pmorton commented 11 years ago

It sounds like there are two issues here:

  1. UAC enabled prevent some commands from running.
  2. Accessing network resources when your token is remote, results in access denied when accessing remote resources

Is that correct?

WRT 1: What about: http://aaron-kelley.net/blog/2010/01/disable-remote-uac-in-windows-vista-and-windows-7/ WRT 2: I think that we need to look into enabling CredSSP. I am working on WinRM gem 2.0, I will consider adding this to my list of things to reverse engineer.

sneal commented 11 years ago

I'd like to figure out what the recommended "best practice" is for running the Windows chef-client in a production environment. From hearing how Nordstrom's runs chef client I suspect they do this:

@pmorton You're correct about there being two issues. The ps_runas script causes 1 to start failing, but fixes 2.

kashook commented 11 years ago

@pmorton, regarding the link you posted discussing LocalAccountTokenFilterPolicy, we in fact do this on all of our windows servers that we manage with WinRMand/or chef. This setting allows local administrator accounts to access the system with elevated privileges when connecting to the server via a network connection (whether it be a file share, a winrm connection, etc). Domain accounts that are members of the administrators group are already elevated even without this setting. (See this Microsoft support link). This setting is in place on the windows guest machines where I noticed the issue I describe in my original post, and we are using local user accounts in this case.

@sneal, I can tell you a bit about how we currently run chef-client on our production our windows servers. (Not to claim that I am an authoritative source of best practices for chef, but it might give you an idea). We use the chef-client::service recipe (from the opscode community site) to install chef-client as a windows service. The service runs as the Local System account. The very first chef-client run on a new server is kicked off manually by an admin logging directly into the server and running it, or by running it via a 'knife winrm' command (which you get by installing knife-windows), with the chef-client::service recipe in the run list. All subsequent runs are performed by the windows service every 30 minutes. We first noticed the .NET 4 issue I mentioned above when trying to do the first chef-client run on a new system via WinRM with both the chef-client::service recipe in the run list in addition to our .NET 4 recipe. (The .NET 4 recipe I've been talking about is one that we created on our own before there was a community recipe for .NET 4. Our original version simply shelled out to run the .NET installer instead of using the task scheduler).

stonith commented 11 years ago

Running into the same issue as well. I'd prefer to leave UAC on for developer workstations as it's enabled in production. I haven't found a good way around this. Is it possible to issue a runas in the winrm command to get around this?

sneal commented 11 years ago

The current vagrant-windows gem essentially issues a runas command via a PowerShell script that starts a new process with credentials. This works in the current incarnation with the winrm 1.0 gem and basic auth, allowing credential delegation to work correctly but only with UAC turned off.

I tried upgrading vagrant-windows to use the unreleased winrm 2.0 gem which uses ntlm instead of basic auth and it fails to start a new proces. It errors out with "access denied" calling Start() on the System.Diagnostics.Process.

So it seems this issue will need to be resolved before we can upgrade the winrm gem.

sneal commented 11 years ago

FWIW I pushed a branch that works with the winrm 2.0 gem. Its a WIP, but I was able to standup a box and run chef-solo. On the plus side:

stonith commented 11 years ago

Thanks, I gave it a try but received the following error when running an up or halt:

https://gist.github.com/stonith/5825272

kashook commented 11 years ago

@stonith, I got that error as well. I think it's because the winrm 2.0 gem that this branch requires isn't actually out there yet, and the gemspec for vagrant-windows currently allows a version less than 2.0. (I was going to try building 2.0 myself from source when I get a moment).

pmorton commented 11 years ago

If it would help, I could Pre-release 2.0 of winrm.

On Jun 20, 2013, at 11:34 AM, keiths-osc notifications@github.com wrote:

@stonith https://github.com/stonith, I got that error as well. I think it's because the winrm 2.0 gem that this branch requires isn't actually out there yet, and the gemspec for vagrant-windows currently allows a version less than 2.0. (I was going to try building 2.0 myself from source when I get a moment).

— Reply to this email directly or view it on GitHubhttps://github.com/WinRb/vagrant-windows/issues/34#issuecomment-19773974 .

stonith commented 11 years ago

@pmorton Can you pre-release 2.0 without affecting users using previous versions?

pmorton commented 11 years ago

Yes. Pre-release versions must be installed explicitly.

On Jun 20, 2013, at 4:05 PM, stonith notifications@github.com wrote:

@pmorton https://github.com/pmorton Can you pre-release 2.0 without affecting users using previous versions?

— Reply to this email directly or view it on GitHubhttps://github.com/WinRb/vagrant-windows/issues/34#issuecomment-19789374 .

stonith commented 11 years ago

@pmorton yes please! :+1:

sneal commented 11 years ago

:+1: @keiths-osc You're correct. I did not commit the gemspec or gemfile changes necessary for vagrant-windows to work.

kashook commented 11 years ago

I thought I'd give this another shot, but I'm having a bit of trouble finding the WinRM 2.0 gem. Could someone point me to either the source or a pre-release if it exists?

sneal commented 11 years ago

Source only in working branch https://github.com/WinRb/WinRM/tree/working

kashook commented 11 years ago

I made another attempt to test this. I built the winrm 2.0 gem from the branch mentioned above. (I set the version to 2.0.0 from 0.0.1 in version.rb before I built it, and modified the gemspec of vagrant-windows to ensure it would use the winrm 2.0.0 gem). Here is the result:

E:\vagrant\WindowsBoxDev>vagrant up dc01
Bringing machine 'dc01' up with 'virtualbox' provider...
[dc01] Importing base box 'dc01'...
?[0K[dc01] Matching MAC address for NAT networking...
[dc01] Setting the name of the VM...
[dc01] Clearing any previously set forwarded ports...
[dc01] Creating shared folders metadata...
[dc01] Clearing any previously set network interfaces...
[dc01] Preparing network interfaces based on configuration...
[dc01] Forwarding ports...
[dc01] -- 22 => 2222 (adapter 1)
[dc01] -- 3389 => 53389 (adapter 1)
[dc01] -- 5985 => 55585 (adapter 1)
[dc01] Running any VM customizations...
[dc01] Booting VM...
[dc01] Waiting for VM to boot. This can take a few minutes.
[dc01] Forcing shutdown of VM...
[dc01] Destroying VM and associated drives...
C:/Vagrant/embedded/lib/ruby/1.9.1/uri/common.rb:176:in `split': bad URI(is not URI?): http://#<Object:0x314caa8>:#<Object:0x314caa8>/wsman (URI::Inva
lidURIError)
        from C:/Vagrant/embedded/lib/ruby/1.9.1/uri/common.rb:211:in `parse'
        from C:/Vagrant/embedded/lib/ruby/1.9.1/uri/common.rb:747:in `parse'
        from C:/Vagrant/embedded/lib/ruby/1.9.1/uri/common.rb:994:in `URI'
        from C:/Users/keiths/.vagrant.d/gems/gems/winrm-2.0.0/lib/winrm/client.rb:27:in `initialize'
        from C:/Users/keiths/.vagrant.d/gems/gems/vagrant-windows-1.1.0/lib/vagrant-windows/communication/winrmcommunicator.rb:118:in `new'
        from C:/Users/keiths/.vagrant.d/gems/gems/vagrant-windows-1.1.0/lib/vagrant-windows/communication/winrmcommunicator.rb:118:in `new_session'
        from C:/Users/keiths/.vagrant.d/gems/gems/vagrant-windows-1.1.0/lib/vagrant-windows/communication/winrmcommunicator.rb:110:in `session'
        from C:/Users/keiths/.vagrant.d/gems/gems/vagrant-windows-1.1.0/lib/vagrant-windows/communication/winrmcommunicator.rb:32:in `ready?'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/plugins/providers/virtualbox/action/boot.rb:26:in `block in wait_for_boot'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/plugins/providers/virtualbox/action/boot.rb:25:in `times'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/plugins/providers/virtualbox/action/boot.rb:25:in `wait_for_boot'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/plugins/providers/virtualbox/action/boot.rb:17:in `call'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/lib/vagrant/action/warden.rb:34:in `call'
        from C:/Vagrant/embedded/gems/gems/vagrant-1.2.2/plugins/providers/virtualbox/action/customize.rb:31:in `call'
        ... omitted the rest of the trace
sneal commented 11 years ago

That's very odd, its as if the host name and port objects aren't strings. endpoint comes from the @machine.config.winrm.host and its failing on this line in the winrb client.rb:

@endpoint = URI("#{transport}://#{endpoint}:#{opts[:port]}/wsman")

Perhaps try running vagrant in debug mode, it'll log the host and options

logger.debug("Creating WinRM session to #{@machine.config.winrm.host} with options: #{opts}")

kashook commented 11 years ago

Actually, I think I may know what the problem is. I'd been meaning to log an issue for this but had forgotten. I noticed awhile back that if I didn't explicitly set at least one of the vagrant-windows winrm options in my vagrant file, it appeared that none of the settings would get defaulted. I ended up finding that the finalize! method in files such as lib/​vagrant-windows/​config/​winrm.rb was not getting called unless at least one of the values referred to by that script was set in the vagrant file. (I think this may be more of a vagrant problem than a vagrant-windows problem). I forgot that I had hacked my local copy of vagrant-windows to stop doing the UNSET_VALUE stuff in the initialize method and instead to just set all the properties to their actual defaults, and I commented out all the settings in my vagrantfile since I just wanted the defaults. I'll bet the object in the above stacktrace is the UNSET_VALUE.

sneal commented 11 years ago

Yep, I bet that's it!

sneal commented 11 years ago

@keiths-osc I've been playing around with UAC enabled. I've been using the existing vagrant-windows 1.0.3 and the unreleased vagrant-windows version which runs chef-solo through a scheduled task. The only way I can get chef to run without error is to run using the builtin Administrator account. Using the vagrant account (which is part of the administrators group) does not work with either method ps_runas or scheduled task.

I think it may make more sense to tell users to use the Administrator account instead of creating a local Vagrant account. Thoughts?

kashook commented 11 years ago

@sneal, Here is a long answer to your question. :) I appreciate the opportunity for input.

First, I tried to test out your changes for myself again. I got further after realizing I needed to set at least one winrm option in my vagrant file, but still ultimately ended up with an error that I think is most likely due to me not having the correct versions of dependent gems. (I'd likely need a list of all the gems you currently have in your .vagrant.d\gems folder to make sure I have the right versions of everything before I can get this to work).

The built-in Administrator account inherently has UAC disabled even when it's enabled on the system, which I think would explain why using that account works. I'm disappointed to hear that running chef-client in a scheduled task didn't work, and a little surprised. Out of curiosity, is the LocalAccountTokenFilterPolicy registry value set under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\policies\system on your windows VM?

A while back in this thread we discussed how I found that .NET 4 won't install if you try to run the .NET installer via raw winrm (no chef or vagrant involved). If it happens to be chef that you run via winrm, and the chef run kicks off the .net installer from a recipe, then the .net install fails with the exact same error as it does if you run the .net installer directly via winrm. To work around this in our .NET recipe, I made the cookbook itself run the .NET installer via a scheduled task. (The task runs as the local system account). I did this in the .NET cookbook itself because we didn't want a system admin that might decide to kick off chef with a "knife winrm" command in a production environment to run into this issue. This was all outside of vagrant, but it just so happens that coding our .NET cookbook this way allowed it to work just fine in vagrant, as well as in a regular chef environment on real servers. (The cookbook is more portable and works in more scenarios by implementing it this way).

With the sql server issue you encountered in vagrant, your solution was the ps_runas change to vagrant_windows and to disable UAC. I've been using a locally hacked copy of vagrant windows for quite a while that has the ps_runas stuff commented out (so we can keep UAC on and still use a user account other than the built-in administrator). Our cookbooks that require administrative privileges (basically all of them) seem to work without issue so far in this setup, including our .net cookbook (which of course does the task scheduler stuff, otherwise I'm sure it wouldn't work).

I see running chef as the built-in Administrator to be no different than running as some other user but with UAC being disabled on the system. We are using vagrant for testing our recipes. In our production environments, UAC is enabled, and an admin may use a knife winrm command to run chef client on multiple servers simultaneously. The credentials they use will be for that of a non-builtin admin. If we run all of our recipes using the builtin Administrator for testing, we may run into some UAC related issue in production that we won't catch in our testing. Also, vagrant/vagrant seems to be a defacto standard for all the vagrant VMs I have encountered. (We especially use the VMs provided by opscode for chef testing, and our windows VMs are based on windows-fromscratch, which also seems to stick with the idea of having a vagrant user).

Earlier in this thread you had suggested the possibility of having an option to control whether the ps_runas script would be used. That would certainly resolve this issue for us if you still see it as a viable option. Another option might be for you to reconsider the possibility of trying to work around your original sql server issue in the cookbook itself rather than changing how vagrant-windows runs commands. If you were using knife winrm to run chef-client to run your cookbook in an actual production environment, I suspect you'd probably run into the same issues that you originally hit when you were running your cookbook via chef provisioning in vagrant prior to the ps_runas change. (In other words, your cookbook would work in more scenarios for more users). Most windows cookbooks (at least that I have encountered) seem to run just fine via winrm with UAC enabled. We've just hit these one or two weird cases so far (.NET and SQL) where the underlying product we're trying to install doesn't seem to play nicely with winrm.

sneal commented 11 years ago

@keiths-osc Thanks for the detailed response, its appreciated. I need to go back and test this further, I now think my tests were flawed. I'll also double check the LocalAccountTokenFilterPolicy setting.

For a couple different reasons I'm migrating away from the ps_runas script and having vagrant-windows create a scheduled task to run chef immediately.

  1. WinRM 2.0 gem uses NTLM and ps_runas can't create a new process.
  2. Immediate chef std output; you no longer have to wait until the chef run finishes to see feedback.
  3. I thought it might behave better with UAC... the verdict is still out on this.

Even with these changes it sounds like it still isn't going to help you out. I believe you want vagrant-windows to represent as closely as possible the runtime environment that knife winrm provides, so running without ps_runas or a scheduled task (created by the plugin) would be best; I'll add a config option for this.

sneal commented 10 years ago

I can run and install our dotnetframework cookbook with:

I'm going to close this issue since I believe its fixed (in vagrant-windows 1.2) according to the issue summary. My assumption is this will meet your needs. If vagrant-windows still will not work for your environment/setup, please open a new issue so that we can configurationally directly invoke the chef client without creating a scheduled task.