Closed dfarrell07 closed 9 years ago
Here's the timeout
value in the third party archive mod's code:
$timeout = 120,
That's low enough to plausibly be reached during a download of ODL's tar archive. It if is, this could be causing the failure.
Here's the curl command that's doing the download.
command => "curl ${basic_auth} -s ${insecure_arg} ${redirects_arg} ${proxy_arg} -o ${src_target}/${name} ${url}",
Which would resolve to in our case (after much digging through the code):
curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz
A quick test on the same network connection, but in the host OS instead of a VM, took over a minute to download the tar archive. Note that I had to run with elevated permissions to write to /usr/src
, and I removed the -s
flag (silent) so I could get some useful output.
[/usr/src]$ sudo curl -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 218M 100 218M 0 0 3333k 0 0:01:07 0:01:07 --:--:-- 3026k
[/usr/src]$ ls -lh opendaylight-0.2.2.tar.gz
-rw-r--r--. 1 root root 219M Feb 16 19:34 opendaylight-0.2.2.tar.gz
It seems plausible that the timeout is expiring normally, and that it's just too low of a default for ODL.
Note that the network connection being used for this test has a cap of 35Mbps down and 5Mbps up.
Super short term steps:
Repeating the download timing test on the host OS again:
[/usr/src]$ sudo curl -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 218M 100 218M 0 0 3495k 0 0:01:04 0:01:04 --:--:-- 2530k
Very similar download time, 67 seconds the first time and 64 the second.
Manually watching the rate I was getting data using nload
, I observed a consistent 31 to 33 Mbps down. That's very close to the max for this network connection. However, we're still at only have the default timeout of 120.
Increased the timeout value to 600 seconds and re-ran the Beaker CentOS 7 tests. Unlike before, everything passed.
Finished in 4 minutes 58.5 seconds (files took 2 minutes 4.1 seconds to load)
61 examples, 0 failures
I don't think the timeout was actually reached. These are the logs that were output during the tarball download:
centos-7 19:54:14$ puppet apply --verbose --detailed-exitcodes /tmp/apply_manifest.pp.SP52uO
Info: Loading facts
Notice: Scope(Archive::Download[opendaylight-0.2.2.tar.gz]): No checksum for this archive
Notice: Scope(Archive::Download[opendaylight-systemd.tar.gz]): No checksum for this archive
Notice: Compiled catalog for centos-7 in environment production in 0.77 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
(at /usr/share/ruby/vendor_ruby/puppet/type/package.rb:430:in `block (3 levels) in <module:Puppet>')
Info: Applying configuration version '1424134453'
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Download[opendaylight-systemd.tar.gz]/Exec[download archive opendaylight-systemd.tar.gz and check sum]/returns: executed successfully
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: executed successfully
Notice: Finished catalog run in 79.19 seconds
centos-7 executed in 81.37 seconds
Exited: 2
As you can see, it took 81 seconds.,
Going to repeat the test.
Saw a failure and error on the third Beaker CentOS 7 run.
Error: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: change from notrun to 0 failed: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
This is a different failure than first observed. This one, running with timeout overriden to 600 seconds, doesn't mention a timeout but still fails at the same step in the third party archive mod's code.
Full trace:
Failures:
1) opendaylight class testing install methods tarball install should work idempotently with no errors
Failure/Error: apply_manifest(pp, :catch_failures => true)
Beaker::Host::CommandFailure:
Host 'centos-7' exited with 6 running:
puppet apply --verbose --detailed-exitcodes /tmp/apply_manifest.pp.JRaBH6
Last 10 lines of output were:
Notice: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Notice: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Notice: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Error: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
Error: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: change from notrun to 0 failed: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
Warning: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[Unpack opendaylight-0.2.2]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Skipping because of failed dependencies
Notice: Finished catalog run in 309.91 seconds
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/host.rb:285:in `exec'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/helpers.rb:88:in `block in on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/shared/host_manager.rb:74:in `run_block_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/patterns.rb:32:in `block_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/helpers.rb:78:in `on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/helpers.rb:950:in `block in apply_manifest_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/shared/host_manager.rb:74:in `run_block_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/patterns.rb:32:in `block_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/helpers.rb:879:in `apply_manifest_on'
# /home/daniel/.gem/ruby/gems/beaker-2.2.0/lib/beaker/dsl/helpers.rb:957:in `apply_manifest'
# ./spec/spec_helper_acceptance.rb:67:in `block in install_odl'
Finished in 8 minutes 35 seconds (files took 2 minutes 2.1 seconds to load)
61 examples, 1 failure
Failed examples:
rspec ./spec/spec_helper_acceptance.rb:55 # opendaylight class testing install methods tarball install should work idempotently with no errors
Here's the output from the step that failed:
centos-7 20:05:13$ puppet apply --verbose --detailed-exitcodes /tmp/apply_manifest.pp.JRaBH6
Info: Loading facts
Notice: Scope(Archive::Download[opendaylight-0.2.2.tar.gz]): No checksum for this archive
Notice: Scope(Archive::Download[opendaylight-systemd.tar.gz]): No checksum for this archive
Notice: Compiled catalog for centos-7 in environment production in 0.78 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
(at /usr/share/ruby/vendor_ruby/puppet/type/package.rb:430:in `block (3 levels) in <module:Puppet>')
Info: Applying configuration version '1424135112'
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Download[opendaylight-systemd.tar.gz]/Exec[download archive opendaylight-systemd.tar.gz and check sum]/returns: executed successfully
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[Unpack opendaylight-0.2.2]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Notice: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Notice: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Notice: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Error: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
Error: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: change from notrun to 0 failed: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
Warning: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[Unpack opendaylight-0.2.2]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Skipping because of failed dependencies
Warning: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Skipping because of failed dependencies
Notice: Finished catalog run in 309.91 seconds
centos-7 executed in 312.00 seconds
Exited: 6
should work idempotently with no errors (FAILED - 1)
should work idempotently with no errors (FAILED - 1)
File "/opt/opendaylight-0.2.2/"
File "/opt/opendaylight-0.2.2/"
Note that the step took 309 seconds. I observed via nload
that data was transferred normally, then stopped for most of the delay (as expected given the info above).
Running the same CentOS 7 Beaker tests yet again, I see it passing with a 266 second tarball download.
centos-7 20:28:25$ puppet apply --verbose --detailed-exitcodes /tmp/apply_manifest.pp.NO4MbI
Info: Loading facts
Notice: Scope(Archive::Download[opendaylight-0.2.2.tar.gz]): No checksum for this archive
Notice: Scope(Archive::Download[opendaylight-systemd.tar.gz]): No checksum for this archive
Notice: Compiled catalog for centos-7 in environment production in 0.78 seconds
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
(at /usr/share/ruby/vendor_ruby/puppet/type/package.rb:430:in `block (3 levels) in <module:Puppet>')
Info: Applying configuration version '1424136504'
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Download[opendaylight-systemd.tar.gz]/Exec[download archive opendaylight-systemd.tar.gz and check sum]/returns: executed successfully
Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: executed successfully
Notice: Finished catalog run in 264.51 seconds
centos-7 executed in 266.74 seconds
Exited: 2
Finished in 8 minutes 3 seconds (files took 2 minutes 0.5 seconds to load)
61 examples, 0 failures
After updating the vagrant-opendaylight project to make use of both the tarball and RPM install methods supported by this Puppet module, I'm able to test the tarball install that's showing this error in a very simple environment. With only one provision run attempted so far, I'm seeing the error:
==> cent7_puppet_tarball: Running provisioner: puppet...
==> cent7_puppet_tarball: Running Puppet with odl_tarball_install.pp...
==> cent7_puppet_tarball: Warning: Config file /etc/puppet/hiera.yaml not found, using Hiera defaults
==> cent7_puppet_tarball: Notice: Scope(Archive::Download[opendaylight-0.2.2.tar.gz]): No checksum for this archive
==> cent7_puppet_tarball: Notice: Scope(Archive::Download[opendaylight-systemd.tar.gz]): No checksum for this archive
==> cent7_puppet_tarball: Notice: Compiled catalog for localhost in environment production in 0.94 seconds
==> cent7_puppet_tarball: Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
==> cent7_puppet_tarball: (at /usr/share/ruby/vendor_ruby/puppet/type/package.rb:430:in `block (3 levels) in <module:Puppet>')
==> cent7_puppet_tarball: Notice: /Stage[main]/Archive::Prerequisites/Package[unzip]/ensure: created
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Download[opendaylight-systemd.tar.gz]/Exec[download archive opendaylight-systemd.tar.gz and check sum]/returns: executed successfully
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[Unpack opendaylight-0.2.2]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
==> cent7_puppet_tarball: Error: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
==> cent7_puppet_tarball: Error: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: change from notrun to 0 failed: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
==> cent7_puppet_tarball: Warning: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[Unpack opendaylight-0.2.2]: Skipping because of failed dependencies
==> cent7_puppet_tarball: Warning: /Stage[main]/Opendaylight::Config/File_line[tomcatport]: Skipping because of failed dependencies
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
==> cent7_puppet_tarball: Warning: /Stage[main]/Opendaylight::Config/File[org.apache.karaf.features.cfg]: Skipping because of failed dependencies
==> cent7_puppet_tarball: Warning: /Stage[main]/Opendaylight::Service/Service[opendaylight]: Skipping because of failed dependencies
==> cent7_puppet_tarball: Notice: Finished catalog run in 461.33 seconds
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
Here's the most interesting part:
Error: curl -s -L -o /usr/src/opendaylight-0.2.2.tar.gz https://nexus.opendaylight.org/content/groups/public/org/opendaylight/integration/distribution-karaf/0.2.2-Helium-SR2/distribution-karaf-0.2.2-Helium-SR2.tar.gz returned 18 instead of one of [0]
Connecting to the Vagrant box that failed above (via vagrant-opendaylight) and poking around:
[~/vagrant-opendaylight]$ vagrant ssh cent7_puppet_tarball
Last login: Tue Jul 22 03:42:16 2014 from 10.0.2.2
[vagrant@localhost ~]$ ls /usr/src/
debug kernels opendaylight-0.2.2.tar.gz opendaylight-systemd.tar.gz vboxguest-4.3.14
Very interestingly, the archive seems to have partially downloaded (85MB of a few 100MBs)
[vagrant@localhost ~]$ cd /usr/src/
[vagrant@localhost src]$ ls -lh opendaylight-0.2.2.tar.gz
-rw-r--r--. 1 root root 85M Feb 18 17:05 opendaylight-0.2.2.tar.gz
A downstream user of of this Puppet mod, vagrant-opendaylight, also has an open issue to track/debug this failure.
Cross linking interesting results. Basically, they are able to replicate the failure in a Vagrant box that's provisioned using this mod's tarball install method.
I haven't seen this issue since moving to camptocamp-archive
from gini-archive
(only on the archive_mod
branch for now).
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Download[opendaylight-systemd.tar.gz]/Exec[download archive opendaylight-systemd.tar.gz and check sum]/returns: executed successfully
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Download[opendaylight-0.2.2.tar.gz]/Exec[download archive opendaylight-0.2.2.tar.gz and check sum]/returns: executed successfully
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-systemd]/Archive::Extract[opendaylight-systemd]/Exec[opendaylight-systemd unpack]/returns: executed successfully
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[opendaylight-0.2.2 unpack]/returns: executed successfully
After five successful vagrant up
/check download success/vagrant destroy -f
cycles (using vagrant-opendaylight) since the move to camptocamp-archive
, it seems likely that this was fixed. I don't know what about gini-archive
was failing and causing incomplete/hung downloads, but it seems to be something it doesn't share with its parent.
[~/vagrant-opendaylight]$ vagrant ssh cent7_puppet_tarball
Last login: Tue Jul 22 03:42:16 2014 from 10.0.2.2
[vagrant@localhost ~]$ ls -lh /usr/src/
total 219M
drwxr-xr-x. 2 root root 6 Jun 10 2014 debug
drwxr-xr-x. 3 root root 34 Jul 22 2014 kernels
-rw-r--r--. 1 root root 219M Feb 26 21:05 opendaylight-0.2.2.tar.gz
-rw-r--r--. 1 root root 447 Feb 26 21:00 opendaylight-systemd.tar.gz
lrwxrwxrwx. 1 root root 51 Jul 21 2014 vboxguest-4.3.14 -> /opt/VBoxGuestAdditions-4.3.14/src/vboxguest-4.3.14
[vagrant@localhost ~]$
Cross-linking test results from vagrant-opendaylight (tl;dr - saw incomplete download failure using test script in VM).
Just observed a download failure during a vagrant up cent7_puppet_tarball
:
==> cent7_puppet_tarball: Notice: /Stage[main]/Opendaylight::Install/Archive[opendaylight-0.2.2]/Archive::Extract[opendaylight-0.2.2]/Exec[opendaylight-0.2.2 unpack]: Dependency Exec[download archive opendaylight-0.2.2.tar.gz and check sum] has failures: true
Note the incomplete download (ODL should be ~219MB):
[vagrant@localhost ~]$ ls -lh /usr/src/opendaylight-0.2.2.tar.gz
-rw-r--r--. 1 root root 99M Feb 26 22:26 /usr/src/opendaylight-0.2.2.tar.gz
This doesn't seem to happen very often, and the (quite extensive) testing documented on this issue gave confounding results that seem to (maybe) point at rare but non-zero failures from ODL's download servers. We're only noticing them because we make so many downloads.
Closing for now, will open more specific issues as needed.
To be super clear: This turned out not to be an issue with the ODL Puppet mod. It was replicated in Vagrant environments that were totally unrelated to the mod.
The archive module we're consuming is failing to download OpenDaylight's tarball with a timeout error during CentOS 7 Beaker tests.
This may be an issue with the third-part module, not the ODL mod. Note that we're already trying to move away from gini/puppet-archive, per #52.
This is related to #45 and #51.