Closed vincenthuynh closed 9 years ago
FYI, I was able to reproduce this using a template Vagrantfile on different SL accounts.
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ju2wheels/SL_CENTOS_6_64"
config.vm.provider :softlayer do |sl|
sl.datacenter = "tor01"
sl.ssh_key = []
sl.domain = "cw.com"
sl.hostname = "vagrant-base"
sl.api_key = ENV["SL_API_KEY"]
sl.username = ENV["SL_USERNAME"]
end
end
vagrant up --provider softlayer
Hi @vincenthuynh,
Would you be willing to provide a few more test details? If you can please perform the following:
Edit C:/Users/foo/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb
and add pp record
at line 41 so it looks like this:
# Returns the private IP address record if the instance has been
# defined as private only, the public IP address record otherwise
# unless the force_private_ip override is true.
def ip_address_record(env)
data_type = env[:machine].provider_config.private_only ? "primaryBackendNetworkComponent" : "primaryNetworkComponent"
data_type = "primaryBackendNetworkComponent" if env[:machine].provider_config.force_private_ip
mask = "#{data_type}.primaryIpAddressRecord.id,#{data_type}.primaryIpAddressRecord.ipAddress"
record = sl_warden { env[:sl_machine].object_mask("mask[#{mask}]").getObject }
pp record
return {
:address => record[data_type]["primaryIpAddressRecord"]["ipAddress"],
:id => record[data_type]["primaryIpAddressRecord"]["id"]
}
end
@underscorephil : any internal API side issues this week with timing of getting the IP address of a virt server immediately after build? This has been working for us for a while. Its getting the IP address through applying an object mask to Virtual_Guest service and and calling getObject.
Here is the relevant output that was obtained from pp(record):
INFO warden: Calling OUT action: #<VagrantPlugins::SoftLayer::Action::ReadState:0x00000003179500>
INFO warden: Calling OUT action: #<VagrantPlugins::SoftLayer::Action::SetupSoftLayer:0x000000030eab70>
INFO warden: Calling OUT action: #<Vagrant::Action::Builtin::ConfigValidate:0x000000030eab98>
{"accountId"=><REDACTED>,
"createDate"=>"2015-01-13T18:20:25-05:00",
"dedicatedAccountHostOnlyFlag"=>false,
"domain"=>"cw.com",
"fullyQualifiedDomainName"=>"vagrant-base.cw.com",
"hostname"=>"vagrant-base",
"id"=><REDACTED>,
"lastPowerStateId"=>"",
"lastVerifiedDate"=>"",
"maxCpu"=>1,
"maxCpuUnits"=>"CORE",
"maxMemory"=>1024,
"metricPollDate"=>"",
"modifyDate"=>"",
"provisionDate"=>"",
"startCpus"=>1,
"statusId"=>1001,
"uuid"=>"<REDACTED>",
"globalIdentifier"=>"<REDACTED>",
"managedResourceFlag"=>false,
"status"=>{"keyName"=>"ACTIVE", "name"=>"Active"}}
ERROR warden: Error occurred: undefined method `[]' for nil:NilClass
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
INFO warden: Beginning recovery process...
INFO warden: Recovery complete.
@ju2wheels not that I am aware of. Ill look about to double-check.
Hi,
I did some testing and added a retry to the code to see if the IP eventually shows up or not:
def ip_address_record(env)
data_type = env[:machine].provider_config.private_only ? "primaryBackendNetworkComponent" : "primaryNetworkComponent"
data_type = "primaryBackendNetworkComponent" if env[:machine].provider_config.force_private_ip
while 1 do
mask = "#{data_type}.primaryIpAddressRecord.id,#{data_type}.primaryIpAddressRecord.ipAddress"
record = sl_warden { env[:sl_machine].object_mask("mask[#{mask}]").getObject }
pp(record)
puts("sleep 5")
sleep(5)
end
return {
:address => record[data_type]["primaryIpAddressRecord"]["ipAddress"],
:id => record[data_type]["primaryIpAddressRecord"]["id"]
}
end
It looks to be a race condition as primaryIpAddressRecord did show up about ~20s after the instance is initiated:
{"accountId"=><REDACTED>,
"createDate"=>"2015-01-14T10:23:30-05:00",
"dedicatedAccountHostOnlyFlag"=>false,
"domain"=>"cw.com",
"fullyQualifiedDomainName"=>"vagrant-base.cw.com",
"hostname"=>"vagrant-base",
"id"=><REDACTED>,
"lastPowerStateId"=>"",
"lastVerifiedDate"=>"",
"maxCpu"=>1,
"maxCpuUnits"=>"CORE",
"maxMemory"=>1024,
"metricPollDate"=>"",
"modifyDate"=>"2015-01-14T10:26:13-05:00",
"notes"=>"testing loop",
"provisionDate"=>"2015-01-14T10:26:13-05:00",
"startCpus"=>1,
"statusId"=>1001,
"uuid"=>"<REDACTED>",
"globalIdentifier"=>"<REDACTED>",
"managedResourceFlag"=>false,
"primaryBackendIpAddress"=>"<REDACTED>",
"primaryIpAddress"=>"<REDACTED>",
"primaryNetworkComponent"=>
{"createDate"=>"2015-01-14T10:23:50-05:00",
"guestId"=><REDACTED>,
"id"=><REDACTED>,
"macAddress"=>"<REDACTED>",
"maxSpeed"=>10,
"modifyDate"=>"2015-01-14T10:24:02-05:00",
"name"=>"eth",
"networkId"=><REDACTED>,
"port"=>1,
"speed"=>10,
"status"=>"ACTIVE",
"uuid"=>"<REDACTED>",
"primaryIpAddress"=>"<REDACTED>",
"primaryIpAddressRecord"=>{"id"=><REDACTED>, "ipAddress"=>"<REDACTED>"}},
"status"=>{"keyName"=>"ACTIVE", "name"=>"Active"}}
sleep 5
OK cool, if you can please leave an instance of a failed build running and open an SL ticket with a link to this issue and post the ticket number here.
@underscorephil We are using the upstream wait_until_ready method for VirtualServer instance and this normally had the IPs ready once it returned successful, seems to be a new timing issue after that.
I've opened ticket #16083324 in the SL portal.
Hello, Do we have any updates on this? We've been seeing some weird behaviours this week... I forgot to mention this but the ticket was closed citing that the issue will be handled here.
Did they forward it on to development at all? Was hoping for the devs to at least confirm its not a bug on SL API side (that expecting the IP Address to be there after wait_until_ready returns successfully is not normal behavior and we have to explicitly wait for them to show up as this has been working for quite a while).
Its a simple fix but if it is an upstream bug we would just be masking it and it would be better to fix in wait_until_ready upstream as the expectation would be "ready" means reachable...
@underscorephil , @SLsthompson
The support person got in contact with Phil and said we should expect an answer here.
Hi guys, is there any update on this issue? I think I am hitting the same bug:
undefined method `[]' for nil:NilClass
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb:42:in `ip_address_record'
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb:15:in `ip_address'
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/read_ssh_info.rb:23:in `read_ssh_info'
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/read_ssh_info.rb:15:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/warden.rb:34:in `call'
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/setup_softlayer.rb:34:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/warden.rb:34:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/builtin/config_validate.rb:25:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/warden.rb:34:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/builder.rb:116:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/runner.rb:66:in `block in run'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/util/busy.rb:19:in `busy'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/runner.rb:66:in `run'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/machine.rb:212:in `action_raw'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/machine.rb:189:in `block in action'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/environment.rb:485:in `lock'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/machine.rb:176:in `call'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/machine.rb:176:in `action'
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/provider.rb:21:in `ssh_info'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/machine.rb:405:in `ssh_info'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/plugins/communicators/ssh/communicator.rb:46:in `block in wait_for_ready'
/opt/vagrant/embedded/lib/ruby/2.0.0/timeout.rb:66:in `timeout'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/plugins/communicators/ssh/communicator.rb:42:in `wait_for_ready'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.1/lib/vagrant/action/builtin/wait_for_communicator.rb:16:in `block in call'
The internal ticket was closed due to inactivity :-(
Let me see if I can get a solid answer out of the folks on the Compute team.
When you are waiting for the provision to be done, are you waiting on all the transactions (i.e. is the second parameter, wait_for_transactions, true?
Although... even if that's the case, there could be a moment or two, a time between transactions, when the wait_until_ready routine might return even though the back-end fires up a transaction immediately afterward.
Yes (at least by default in the plugin): https://github.com/audiolize/vagrant-softlayer/blob/master/lib/vagrant-softlayer/action/wait_for_provision.rb#L23
@vincenthuynh, @irifed : Have you disabled sl.transaction_wait
or modified sl.provision_timeout
?
Although... even if that's the case, there could be a moment or two, a time between transactions, when the wait_until_ready routine might return even though the back-end fires up a transaction immediately afterward.
If theres any chance of that are you ok with us pushing this fix upstream and saying that for wait_until_ready to complete successfully it must have an IP?
@ju2wheels : I am setting provision timeout to 3600: sl.provision_timeout = 3600
.
However, error
undefined method `[]' for nil:NilClass
/Users/irina/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb:42:in `ip_address_record'
[...]
appears almost immediately after vagrant up
.
Please let me know if there is any (even temporary) workaround -- I desperately need this working. Thanks!
The workaround we used was to put in a loop to check that the record hash contains ipAddress before being returned. You can see my fork for this bandaid fix.
So the long term fix is a change on the SL side I take it? And if so, would it help if I (and others) logged a ticket with SL referencing this issue asking for a fix?
I have the workaround in place but I don't like dangling bugs...
I noticed that the wait_until_ready
routine checks if the provisionDate is there, but it doesn't necessarily ensure that it's not nil
.
Could someone that is having this trouble try changing the wait_until_ready
routine so that it checks to make sure that the provisionDate
property exists AND has a non-empty value?
Could someone that is having this trouble try changing the
wait_until_ready
routine so that it checks to make sure that theprovisionDate
property exists AND has a non-empty value?
- Remove/comment out the patch for network ip wait
- Edit
~/.vagrant.d/gems/gems/softlayer_api-3.0.0/lib/softlayer/VirtualServer.rb
to change this:
def wait_until_ready(max_trials, wait_for_transactions = false, seconds_between_tries = 2)
# pessimistically assume the server is not ready
num_trials = 0
begin
self.refresh_details()
has_os_reload = has_sl_property? :lastOperatingSystemReload
has_active_transaction = has_sl_property? :activeTransaction
reloading_os = has_active_transaction && has_os_reload && (self.lastOperatingSystemReload['id'] == self.activeTransaction['id'])
provisioned = has_sl_property? :provisionDate
# a server is ready when it is provisioned, not reloading the OS
# (and if wait_for_transactions is true, when there are no active transactions).
ready = provisioned && !reloading_os && (!wait_for_transactions || !has_active_transaction)
num_trials = num_trials + 1
yield ready if block_given?
sleep(seconds_between_tries) if !ready && (num_trials <= max_trials)
end until ready || (num_trials >= max_trials)
ready
end
to this:
def wait_until_ready(max_trials, wait_for_transactions = false, seconds_between_tries = 2)
# pessimistically assume the server is not ready
num_trials = 0
begin
self.refresh_details()
has_os_reload = has_sl_property? :lastOperatingSystemReload
has_active_transaction = has_sl_property? :activeTransaction
reloading_os = has_active_transaction && has_os_reload && (self.lastOperatingSystemReload['id'] == self.activeTransaction['id'])
provisioned = has_sl_property?(:provisionDate) && ! self['provisionDate'].empty?
# a server is ready when it is provisioned, not reloading the OS
# (and if wait_for_transactions is true, when there are no active transactions).
ready = provisioned && !reloading_os && (!wait_for_transactions || !has_active_transaction)
num_trials = num_trials + 1
yield ready if block_given?
sleep(seconds_between_tries) if !ready && (num_trials <= max_trials)
end until ready || (num_trials >= max_trials)
ready
end
@vincenthuynh , @irifed , @poflynn could either of you give the above a try and see if it helps.
Was finally able to reproduce this on my end (didnt notice my lab box was running the old vagrant-softlayer 0.3.3).
@SLsthompson: Checking for the empty string resolves the problem it seems to be holding up now. PR submitted.
Fixed the above example as well, missed some parens.
softlayer_api 3.0.1 includes the fix.
Thanks.
@emyl vagrant-softlayer 0.4.1 is good for release to fix this one.
Until 0.4.1 is released the official fix for this is:
softlayer_api
gem:vagrant plugin update
rm -rf ~/.vagrant.d/gems/gems/softlayer_api-3.0.0/
Hi,
I would like to report that I am currently encountering this bug again. I am using a new environment with the following gem and plugin:
softlayer_api-3.0.2
vagrant-softlayer-0.4.0
Should I create a new bug or can we reopen this?
Hi @vincenthuynh, please attach any debug logs as we did last time if you can and also let us know/verify manually the versions of the softlayer_api
gems show in
ls -d ~/.vagrant.d/gems/gems/softlayer_api*
Since this was originally marked as an upstream issue please also open an SL ticket and cross reference both.
I've narrowed down the issue to another plugin we used called vagrant-hostmanager
which uses the vagrant-softlayer
plugin to retrieve the IP address of active machines.
Some background:
The error occurs in a specific scenario in our continuous-deployment process. We are archiving the .vagrant
directory in an existing workspace that was used to provision servers and then migrating it to a new environment and workspace. This is so that we don't lose references to the servers that have already been provisioned. The vagrant-hostmanager
plugin will look for active servers in the .vagrant
directory, then retrieves the IPs (using vagrant-softlayer
) and populate /etc/hosts
for all the servers in our cluster.
This has been working consistently until we noticed it failing this week.
Looks like something has changed upstream to have caused this. I'm not sure if this will be the appropriate form to raise this but I thought I would still leave this here for posterity.
The nil
that is returned is from the record
hash in network.rb
> vagrant hostmanager
[app01] Updating /etc/hosts file...
15:58:32 nil
15:58:32 /var/lib/jenkins/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb:44:in `ip_address_record': undefined method `[]' for nil:NilClass (NoMethodError)
15:58:32 from /var/lib/jenkins/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/util/network.rb:16:in `ip_address'
15:58:32 from /var/lib/jenkins/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/read_ssh_info.rb:23:in `read_ssh_info'
15:58:32 from /var/lib/jenkins/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/read_ssh_info.rb:15:in `call'
15:58:32 from /opt/vagrant/embedded/gems/gems/vagrant-1.6.5/lib/vagrant/action/warden.rb:34:in `call'
15:58:32 from /var/lib/jenkins/.vagrant.d/gems/gems/vagrant-softlayer-0.4.0/lib/vagrant-softlayer/action/setup_softlayer.rb:34:in `call'
15:58:32 from /opt/vagrant/embedded/gems/gems/vagrant-1.6.5/lib/vagrant/action/warden.rb:34:in `call'
.....
@ju2wheels Sorry forgot to report back on this... It went away and hasn't occurred for quite sometime now.
We've seen this error if there is an inactive SoftLayer instance in the .vagrant folder (referential integrity issue). But this time, it was encountered on a clean workspace.