chef / knife-windows

Plugin for Chef's knife tool for working with Windows nodes
Apache License 2.0
152 stars 110 forks source link

knife bootstrap windows winrm truncating the batch file #332

Closed philipdouglas closed 6 years ago

philipdouglas commented 8 years ago

I am trying to bootstrap a windows machine over winrm using knife. The batch file that it writes to %TEMP% is being truncated at 370 lines, which is half way through one of the certificates and obviously fails to execute.

I have bootstrapped 40ish machines, most of them windows, without seeing this problem and I'm not doing anything different. Its the first Windows 8 machine I've tried, but I've successfully bootstrapped Windows 10 and Server 2012 R2. The only other difference is that the machine has the Japanese language pack installed.

Any ideas what's going on?

mwrock commented 8 years ago

Do you use the same identical batch file for all windows machines? Is there any kind of error emitted during the bootstrap process and have you run with debug logging to see if that produces any clues?

philipdouglas commented 8 years ago

Do you use the same identical batch file for all windows machines?

Unless knife is doing something differently that I am unaware of, yes.

Is there any kind of error emitted during the bootstrap process and have you run with debug logging to see if that produces any clues?

I get the following error when it tries to run the truncated command at the end of the batch file:

hostname C:\Users\Username> DEBUG: hostname[642275CA-20A1-4FA5-B1D1-19F9AE2D63EB] => :command_cleanup[cmd.exe /C "%TEMP%\bootstrap-9492-1453110190.bat"] DEBUG: hostname[642275CA-20A1-4FA5-B1D1-19F9AE2D63EB] => :shell_close ERROR: Failed to execute command on hostname return code 255

Even with verbose logging, there's no indication the batch script didn't write successfully.

mwrock commented 8 years ago

haven't been able to duplicate this. Even installed the japanese language pack on a vm. If you are still able, would you mind pasting the contents of the .bat file to a gist? I don't believe there is anything sensitive in the stock bat files. If not, some things of interest are the actual number of bytes in the file.

philipdouglas commented 8 years ago

Thanks for looking into this.

Windows says it's 15,237 bytes. Here's the file (with the certificates and encrypted data bag secret redacted): https://gist.github.com/FreakyDug/2ebdf8d81e9c35566671

philipdouglas commented 8 years ago

I tried removing the --secret-file so the batch file would be slightly different. It did write more of the certificate but neither the file length or size stayed the same.

mwrock commented 8 years ago

could you tell me the output of running the command chcp from a console on this node?

philipdouglas commented 8 years ago
C:\Users\USERNAME>chcp
Active code page: 932
jedatwork commented 8 years ago

I'm seeing the same issue, the bootstrap batch file cuts off at line 370 in the middle of echoing a certificate to file and the process fails with: ERROR: Failed to execute command on xx.xx.xx.xx return code 255 output of chcp is: Active code page: 437

version information: > knife --version Chef: 12.5.1 > chef --version Chef Development Kit Version: 0.10.0 chef-client version: 12.5.1 berks version: 4.0.1 kitchen version: 1.4.2

Target node is Server 2012R2.

mwrock commented 8 years ago

could both of you include the knife bootstrap commands you are using. It may help to see what knife args you are using. Thanks!

jedatwork commented 8 years ago

From my workstation I ran knife bootstrap windows winrm xx.xx.xx.xx --winrm-user username --winrm-password 'password' --node-name mynode --run-list 'recipe[my_recipe_name]' --winrm-transport ssl --winrm-ssl-verify-mode verify_none

Just to make sure it wasn't anything with the certs or run lists I ran knife node create, assigned a run list, installed the chef client manually on the server, created a client.rb file referencing the same certificates that I manually copied to the server and running chef-client. The chef-client run completed successfully and accepted all of the certificates.

philipdouglas commented 8 years ago

I run: knife bootstrap windows winrm hostname -x domain\username -N hostname -r '''role[base-win]''' --environment=production

mwrock commented 8 years ago

So it sounds like its not an issue with command length. Encoding is somewhat doubtful since @jedatwork has normal 437 but that does not strictly mean its still not an issue for @FreakyDug . It does sound like both of you run into problems during the rendering of the trusted certs.

If you run with debug logging (-VV) it may be helpful to see the last few debug lines prior to:

DEBUG: hostname[642275CA-20A1-4FA5-B1D1-19F9AE2D63EB] => :command_cleanup[cmd.exe /C "%TEMP%\bootstrap-9492-1453110190.bat"]
DEBUG: hostname[642275CA-20A1-4FA5-B1D1-19F9AE2D63EB] => :shell_close
ERROR: Failed to execute command on hostname return code 255
mwrock commented 8 years ago

Also @FreakyDug or @jedatwork if you are able to easily reproduce, shoot me an email (its in my profile) and I'd be happy to setup a skype/hangout/zoom.

philipdouglas commented 8 years ago

Here's more of the debug logs, though it doesn't reveal much:

hostname C:\Users\username.DOMAIN>msiexec /qn /log "C:\Users\username.DOMAIN\AppData\Local\Temp\chef-client-msi12420.log" /i "C:\Users\username.DOMAIN\AppData\Local\Temp\chef-client-latest.msi"
hostname Successfully installed Chef Client package.
hostname A subdirectory or file C:\chef\trusted_certs already exists.
hostname Installation completed successfully
hostname Writing validation key...
hostname Validation key written.
hostname
hostname C:\Users\username.DOMAIN>mkdir C:\chef\trusted_certs
hostname
hostname C:\Users\username.DOMAIN>(
hostname echo.-----BEGIN CERTIFICATE-----
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.REDACTED
hostname  echo.-----END CERTIFICATE-----
hostname ) 1>C:\chef/trusted_certs/chef.crt
hostname
hostname C:\Users\username.DOMAIN>
DEBUG: hostname[B82E221D-568E-45A5-8F0A-ED52511ADC43] => :command_cleanup[cmd.exe /C "%TEMP%\bootstrap-5128-1454323832.bat"]
DEBUG: hostname[B82E221D-568E-45A5-8F0A-ED52511ADC43] => :shell_close
ERROR: Failed to execute command on hostname return code 255
mwrock commented 8 years ago

The main thing I'd like to understand from the debug logs is if the entire file is being "sent" to the machine and not being written or if the interruption is happening during transmission. I'm curious if there are debug logs showing the entirety of the trusted cert winrm command output.

philipdouglas commented 8 years ago

Ah, I see. Yeah, the debug log shows it writing the whole batch file, including the whole certificate and the chef config and chef-client call at the end.

mwrock commented 8 years ago

Thats good info. I guess the next thing to try is to find the very first chunk that failed to make it into the file. Copy the command from run_command debug logging and run that in a command window on the node. Then see if it fails or if it appears just partially appended and troubleshoot from there.

philipdouglas commented 8 years ago

I should have noticed this earlier but doing it manually revealed that the missing part is all of chunk 8. 1 to 7 get written correctly but there's nothing from 8.

When I ran the command for 8 manually cmd just printed "More?". I broke the command up and identified the problem line as:

cmd.exe /C echo Rendering "%TEMP%\bootstrap-3036-1454409234.bat" chunk 8 && >> "%TEMP%\bootstrap-3036-1454409234.bat" (echo.SET "PATH=%PATH%;C:\ruby\bin;C:\opscode\chef\bin;C:\opscode\chef\embedded\bin") 

However, if you make it a standalone echo (without any &&s) it works fine:

cmd.exe /C echo >> "%TEMP%\bootstrap-3036-1454409234.bat" (echo.SET "PATH=%PATH%;C:\ruby\bin;C:\opscode\chef\bin;C:\opscode\chef\embedded\bin")

Both versions of the line work fine on my machine and the only difference I can see on the node is that windows prints "\" as "¥" because its set to Japanese.

Any ideas?

philipdouglas commented 8 years ago

Doh, I found the problem. There was a rogue " in the node's PATH variable. I don't know if that's something chef can protect itself against, but it would definitely be helpful if it could report that it failed to write the batch file fully.

mwrock commented 8 years ago

yeah agreed. So that backslash issue was not a problem?

philipdouglas commented 8 years ago

No, my hunch there was wrong.

mwrock commented 8 years ago

just curious exactly how your path was quoted. I've done a similar hack but get a different error code. Did your path have a single pair of double quotes?

philipdouglas commented 8 years ago

Yeah, for some reason, there was a single " at the end of the PATH. I was able to reproduce the "More?" problem by just running:

cmd.exe /C echo test &&  (echo.""")
jedatwork commented 8 years ago

I ended up finding the same issue, the target node had an entry of C:\Program Files\Amazon\cfn-bootstrap" in the path and it was breaking the quoted string in the knife run. After removing from the path the client run seems to go on with no issues.

mwrock commented 8 years ago

Definitely need to make the transmission of this file a smoother ride. Thanks a ton to both @FreakyDug and @jedatwork for the details provided

akemner commented 7 years ago

quotes around entries in the windows ENV PATH caused this issue for me as well. removed the quotes on the entries fixed the issue

cheeseplus commented 6 years ago

Going to close this one as it's not really an issue with the software itself and we've narrowed down what causes this for posterity.