audiolize / vagrant-softlayer

This is a Vagrant plugin that adds a SoftLayer provider to Vagrant, allowing Vagrant to control and provision SoftLayer CCI instances.
MIT License
42 stars 15 forks source link

/vagrant rsync fails with "must have tty" even when postinstall script sets !requiretty #47

Closed david-feldsine closed 9 years ago

david-feldsine commented 9 years ago

Softlayer runs the postinstall script asynchronously and sometimes the vagrant rsync tries to run before the !requiretty from the postinstall script is set. This is an intermittent issue that is related to network latency. Sometimes it fails 80% of the time sometimes 1% of the time. It is my belief that Softlayer should do a blocking wait on the postinstall script, but they have indicated that they do not plan to do so. We can not use the Softlayer Seattle datacenter due to this issue.

My VM is Centos5

POSTINSTALL SCRIPT

!/bin/bash

/bin/sed -i 's/requiretty/!requiretty/' /etc/sudoers;

ERROR TEXT Guest path: /vagrant Command: rsync --verbose --archive --delete -z --copy-links --no-owner --no-group --rsync-path sudo rsync -e ssh -p 22 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i '/xxx.pem' --exclude .vagrant/ /filepath/ root@xxx.xxx.xxx.xxx:/vagrant Error: Warning: Permanently added 'xxx.xxx.xxx.xxx' (RSA) to the list of known hosts. sudo: sorry, you must have a tty to run sudo rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

ju2wheels commented 9 years ago

It is my belief that Softlayer should do a blocking wait on the postinstall script, but they have indicated that they do not plan to do so.

Theres not much from the plugin side we can do in that case as it attempts to provision/sync immediately after the box is deemed up.

There are two not so pretty workarounds you can try:

  1. Separate the calls to vagrant up (build) and provision and add a timed delay between. This will not always work either if the build process takes longer than your anticipated delay:

    vagrant up --no-provision
    #delay
    vagrant provision
  2. Instead of using the OS image templates, create an image of a box you have built where the only change is to disable the TTY in sudoers and then build your vagrant boxes from the GUID of the image for this box. This way will not have the delay issue and you will not have to separate the build/provision, but that is assuming you dont intend to try to use the post_install for other stuff and just pass that work on to the provisioner. The only real down side to this way is the lack of ability to change the disk sizes/number of disks in use (you would need one image for each variant).
david-feldsine commented 9 years ago

It is my belief that this is truly a Softlayer issue that should be solved by making the postinstall script block the completion of the server, but Softlayer does not seem receptive to making this change.

Option 1, running "vagrant up --no-provision" will not solve this problem as the rsyncs of the folders (including /vagrant) happens even when --no-provision is specified.

Option 2 will work but it is significantly more work on a recurring bases to create new base image rather than using the out of the box Softlayer image.

A technically wrong in my opinion but simple solution would be to create a Vagrantfile parameter that would allow control of a sleep timer between the time the Softlayer API reports the box as ready and when vagrant starts the rsyncs.

I agree that this would be a hack, but I am in a corner because Softlayer will not fix this in what I believe is the technically correct manner, and I would like to keep using their base boxes rather than create a new image to address a single setting.

Please help.

causton81 commented 9 years ago

I think you could install your vagrant public key on control.softlayer.com, have vagrant provision the VMs with that key, and configure vagrant to login as root. That way you don't need !requiretty.

Of course you can lock it down as much as you want after the initial provision.

ju2wheels commented 9 years ago

It is my belief that this is truly a Softlayer issue that should be solved by making the postinstall script block the completion of the server, but Softlayer does not seem receptive to making this change.

I agree, without it, it makes the feature pointless but I think they have numerous issues with it under the hood as it doesnt work consistently (or at all) on some versions of Windows as well.

Option 1, running "vagrant up --no-provision" will not solve this problem as the rsyncs of the folders (including /vagrant) happens even when --no-provision is specified.

Do you have it explicitly disabled or are you depending on and leveraging the sync?

config.vm.synced_folder ".", "/vagrant", disabled: true

If you cant get by with the above options are you comfortable building/installing a one off gem? I can add that delay option for you in a separate branch this weekend.

ju2wheels commented 9 years ago

Closing, alternatives outlined and permanent fix cant be resolved from our end.

lonniev commented 9 years ago

@ju2wheels does the vagrant softlayer provisioner have the opportunity to delay between "Waiting for machine to boot" and the start of the folder rsyncs? Eg. sl.provision_holdoff = 9999 # seconds?

If the opportunity is there then this would allow the user some hope that a delay could be tuned that was not needlessly too long but was sufficiently long enough to allow the asynchronously running post_install processes to complete.

If no, if the only place the delay could be injected is either within core vagrant code or within the SL vagrant-hostile code, then I agree that a Close No Change here is your only option.

ju2wheels commented 9 years ago

We could but that wouldnt really solve the problem, just delay it and hope for the best (and in the case of a failed provision delay needlessly). It would be better to have your post_install script touch a file on the system when it starts and touch another file when its complete and have your provision automation check for this.

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed.

lonniev commented 9 years ago

I certainly can have my post_install script do that file touching (a kind of interprocess join protocol) but I don’t have control over keeping SL from rebooting the instance while that post_install script is midway between the start and end points.

Do you know a way to force SL to also use the join files?

I agree that the “wedge in some nearly random delay” hack is ugly. But, if it increases the odds of success from, say, 5% to, say, 80% then it is valuable nonetheless.

—Lonnie VanZandt

303-900-3048 Sent from Dropbox's Mailbox on Mac

On Mon, Apr 20, 2015 at 4:44 PM, Julio Lajara notifications@github.com wrote:

We could but that wouldnt really solve the problem, just delay it and hope for the best (and in the case of a failed provision delay needlessly). It would be better to have your post_install script touch a file on the system when it starts and touch another file when its complete and have your provision automation check for this.

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed.

Reply to this email directly or view it on GitHub: https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94471692

ju2wheels commented 9 years ago

I don’t have control over keeping SL from rebooting the instance while that post_install script is midway between the start and end points

Right, this is something we wouldnt have control over either so would need the input from SL on how their end behaves and if its true that they are rebooting before it finishes theres unfortunately nothing we can do from vagrant-softlayer side to stop it.

poflynn commented 9 years ago

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed.

Like the OP my problem is that my provisioning code never gets to run as the rsync of /vagrant fails as rsync requires the tty setting mentioned by OP be on. :-(

lonniev commented 9 years ago

There seems to be some perspective-confusion here: “when your provision automation runs” for me is, in fact, the post_install script. I can’t have that script wait on itself.

The root cause problem is that SL is either rebooting the server or letting run a process that reboots the server even though the post install script is underway. Perhaps they thought that either a human would launch a post installation process after getting an email (so that the latencies between various phases of provisioning are very long) or they thought that a provisioning script would only do trivial and quick tasks (so that a concurrent task that takes some time but ends in a reboot would still allow the post install to complete).

Whatever the reasons, clearly SL is starting the execution of the post install script during an assuredly unsafe interval.

—Lonnie VanZandt

303-900-3048 Sent from Dropbox's Mailbox on Mac

On Mon, Apr 20, 2015 at 6:44 PM, poflynn notifications@github.com wrote:

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed.

Like the OP my problem is that my provisioning code never gets to run as the rsync of /vagrant fails as rsync requires the tty setting mentioned by OP be on. :-(

Reply to this email directly or view it on GitHub: https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94504775

david-feldsine commented 9 years ago

The cause of this issue is clearly understood.

Vagrant SL plugin requests machine from SL. SL provisions the server, spawns an asynchronous background process to run the post_install script. and responds back that the machine is ready. Vagrant SL plugin tries to run the rsync and it fails, because the asynchronous process that runs the post_install script has not actually run the script yet. (The post install scripts sets !requiretty).

Dave

On Mon, Apr 20, 2015 at 9:50 AM, Lonnie VanZandt notifications@github.com wrote:

There seems to be some perspective-confusion here: “when your provision automation runs” for me is, in fact, the post_install script. I can’t have that script wait on itself.

The root cause problem is that SL is either rebooting the server or letting run a process that reboots the server even though the post install script is underway. Perhaps they thought that either a human would launch a post installation process after getting an email (so that the latencies between various phases of provisioning are very long) or they thought that a provisioning script would only do trivial and quick tasks (so that a concurrent task that takes some time but ends in a reboot would still allow the post install to complete).

Whatever the reasons, clearly SL is starting the execution of the post install script during an assuredly unsafe interval.

—Lonnie VanZandt

303-900-3048 Sent from Dropbox's Mailbox on Mac

On Mon, Apr 20, 2015 at 6:44 PM, poflynn notifications@github.com wrote:

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed. Like the OP my problem is that my provisioning code never gets to run as the rsync of /vagrant fails as rsync requires the tty setting mentioned by

OP be on. :-(

Reply to this email directly or view it on GitHub:

https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94504775

— Reply to this email directly or view it on GitHub https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94506478 .

ju2wheels commented 9 years ago

@david-feldsine I think we are cross mixing threads here, yours was related to Linux, @lonniev is strictly referencing Windows which is a bit different.

@lonniev lets take this back to #54

ju2wheels commented 9 years ago

@poflynn when you say your code never runs are you effectively having the same problem as @lonniev ?

[edit] I asked a dumb question should have reread thread from beginning. It is possible you may be seeing the same asynch issues as @lonniev if the SL API is returning the server as having completed without waiting for the post_install to finish.

lonniev commented 9 years ago

It’s worse than that… for my case. What I see is that the post_install process is started but more often than not SL reboots the instance while the script is still processing. The result is that even being patient, waiting, and retrying the vagrant session will always fail because the script didn’t finish but SL thinks it ran it.

Yes, there are two scenarios: one for Windows and one for Linux. It sounds though like SL may be making similar bad choices for both.

I wonder if I wrap all of the post install work in some kind of reboot-preventing powershell context if I might be able to hold the Windows instance hostage until my script tasks all run. Right now, I perform each major chore as just a separate powershell command-line command. I might be able to make it an all-or-none proposition. I’m no powershell guru but I know that powershell offers a lot more control for job control and it might have a way to wrap the set of commands.

—Lonnie VanZandt

303-900-3048 Sent from Dropbox's Mailbox on Mac

On Mon, Apr 20, 2015 at 7:01 PM, david-feldsine notifications@github.com wrote:

The cause of this issue is clearly understood. Vagrant SL plugin requests machine from SL. SL provisions the server, spawns an asynchronous background process to run the post_install script. and responds back that the machine is ready. Vagrant SL plugin tries to run the rsync and it fails, because the asynchronous process that runs the post_install script has not actually run the script yet. (The post install scripts sets !requiretty). Dave On Mon, Apr 20, 2015 at 9:50 AM, Lonnie VanZandt notifications@github.com wrote:

There seems to be some perspective-confusion here: “when your provision automation runs” for me is, in fact, the post_install script. I can’t have that script wait on itself.

The root cause problem is that SL is either rebooting the server or letting run a process that reboots the server even though the post install script is underway. Perhaps they thought that either a human would launch a post installation process after getting an email (so that the latencies between various phases of provisioning are very long) or they thought that a provisioning script would only do trivial and quick tasks (so that a concurrent task that takes some time but ends in a reboot would still allow the post install to complete).

Whatever the reasons, clearly SL is starting the execution of the post install script during an assuredly unsafe interval.

—Lonnie VanZandt

303-900-3048 Sent from Dropbox's Mailbox on Mac

On Mon, Apr 20, 2015 at 6:44 PM, poflynn notifications@github.com wrote:

When your provision automation runs, have it fail if it cant find the touched file stating the post_install started, have it sleep and wait indefinitely or for a timeout of your choosing until it sees the touched file indicating the post_install completed. Like the OP my problem is that my provisioning code never gets to run as the rsync of /vagrant fails as rsync requires the tty setting mentioned by

OP be on. :-(

Reply to this email directly or view it on GitHub:

https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94504775

— Reply to this email directly or view it on GitHub https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94506478 .


Reply to this email directly or view it on GitHub: https://github.com/audiolize/vagrant-softlayer/issues/47#issuecomment-94509569