bradenwright / kitchen-lxd_cli

Test Kitchen driver for LXD
Other
10 stars 4 forks source link

scp async upload failing? #5

Closed juju4 closed 8 years ago

juju4 commented 8 years ago

Hello,

I have a simble ansible role that I'm testing with https://github.com/juju4/ansible-adduser/blob/master/.travis.yml

From a jenkins instance on digitalocean/ubuntu1404

14:37:55        Setting up chef (12.12.15-1) ...
14:37:55        Thank you for installing Chef!
14:37:55 D      sudo -E rm -rf /tmp/kitchen/modules /tmp/kitchen/roles /tmp/kitchen/group_vars /tmp/kitchen/host_vars; mkdir -p /tmp/kitchen
14:37:55 D      [SSH] root@10.219.116.161<{:user_known_hosts_file=>"/dev/null", :paranoid=>false, :port=>22, :compression=>false, :compression_level=>0, :keepalive=>true, :keepalive_interval=>60, :timeout=>15, :user=>"root"}> (sudo -E rm -rf /tmp/kitchen/modules /tmp/kitchen/roles /tmp/kitchen/group_vars /tmp/kitchen/host_vars; mkdir -p /tmp/kitchen)
14:37:55        Transferring files to <default-ubuntu-1604>
14:37:55 D      TIMING: scp async upload (Kitchen::Transport::Ssh)
14:37:55 D      Cleaning up local sandbox in /tmp/default-ubuntu-1604-sandbox-20160730-1929-9ns4pd
14:37:55 -----> Cleaning up any prior instances of <default-ubuntu-1404>
14:37:55 -----> Destroying <default-ubuntu-1404>...

It seems to fail on scp async upload as there is no subsequent finished message but we don't know why.

same here https://travis-ci.org/juju4/ansible-adduser/jobs/148556247

here it transfers but with nothing in the role... https://travis-ci.org/juju4/ansible-adduser/jobs/148556246

At this point, I'm unsure if it's in test-kitchen or any plugins part. I believe to had with other provider (vagrant/virtualbox, docker,...) but a lot less often.

Any ideas how to debug?

thanks

bradenwright commented 8 years ago

^^^ Sorry bad click

bradenwright commented 8 years ago

So must of what I'm going to say is probably stupid or may not be what you are looking for, but....

Like I said its pretty much all basic trouble shooting, I don't have experience with digitalocean and very little with ansiible but my buddy has ansible experience, don't think he's messed with kitchen and ansible but good chance I can get him to mess with it for a few mins with me.

What gems are you using? (I assume kitchen-ansible, kitchen-lxd_cli, anyothers).

So one idea although its not really its not really a fix would be Kitchen::Transport::Ssh isn't giving the error. It's possible to write another transport and actually its on my to do list b/c it would speed things up using things like lxc exec <container name> bash or lxc file pull <container>/<path> <dest> lxc file push <source> <container>/<path>. So its possible that it may not only get by this error but speed things up. It's also possible it may complicate things a little b/c you may need to setup lxd remotes or something like that too, I haven't explored enough.

juju4 commented 8 years ago

It seems to be really inside kitchen+lxd. I have not this problem with kitchen+vagrant/virtualbox, vagrant/virtualbox only, or obviously manually. I tried to increase timeout to 30 without any changes.

Travis has it Test stopped at "TIMING: scp async upload (Kitchen::Transport::Ssh)": https://travis-ci.org/juju4/ansible-mhn/jobs/150109530 (target trusty)

And also local test in a jenkins+lxd vps

In my experience for this 2 environments, it can happen on any target system and kind of randomly between test... Not sure what I can add to debug it. travis has kitchen in verbose mode

inside travis, only have $ gem install kitchen $ gem install kitchen-ansible $ gem install kitchen-lxd_cli

on my other setup, part of

This one below might be another bug as role is not applied but its meta dependencies is https://travis-ci.org/juju4/ansible-mhn/jobs/150109529 (target xenial) https://travis-ci.org/juju4/ansible-adduser/jobs/148558525 (here no role execute even if ansible called) Do I open a separate one or transfer related?

bradenwright commented 8 years ago

FYI I probably should have replied again earlier, but I haven't had much time to dig in. I talked a little last week with my buddy at work whose done some ansible stuff, but we haven't gotten back to looking at it more. As you know the errors don't give much info, and its blowing up on stuff that kitchen-lxd_cli doesn't call directly.

1 thing I thought of that you can try, really a work around more than a solution is trying https://github.com/coderanger/kitchen-sync you can try, its sftp. Or maybe ask in IRC if anyone has better ideas, cause mine haven't been very good so far. It's just really weird that you get an scp sync error but ssh works.

I just don't have a good idea, so my plan is when my buddy and I get some time to try to duplicate/dive in hoping to find more. We are just super busy, we have a small team at work and our most senior member put in 2 week notice 1.5 weeks ago.

juju4 commented 8 years ago

Yeah, I tried kitchen-async and for now I have no similar issues so most probably a transport problem. still present in recent gem as I reinstalled everything (under rvm)

$ gem list

*** LOCAL GEMS ***

artifactory (2.3.3)
bigdecimal (1.2.8)
bundler-unload (1.0.2)
did_you_mean (1.0.0)
executable-hooks (1.3.2)
gem-wrappers (1.2.7)
io-console (0.4.5)
json (1.8.3)
kitchen-ansible (0.45.2)
kitchen-lxd_cli (2.0.0)
kitchen-sync (2.1.1)
kitchen-vagrant (0.20.0)
kitchen-verifier-serverspec (0.5.2)
minitest (5.8.3)
mixlib-install (1.1.0)
mixlib-shellout (2.2.6)
mixlib-versioning (1.1.0)
net-scp (1.2.1)
net-sftp (2.1.2)
net-ssh (3.2.0)
net-ssh-gateway (1.2.0)
net-telnet (0.1.1)
power_assert (0.2.6)
psych (2.0.17)
rake (10.4.2)
rdoc (4.2.1)
rubygems-bundler (1.4.4)
rvm (1.11.3.9)
safe_yaml (1.0.4)
test-kitchen (1.11.0)
test-unit (3.1.5)
thor (0.19.1)

So most probably, this bug is still valid https://github.com/test-kitchen/test-kitchen/issues/1035 (have the line "default_config :ssh_sessions, 9" in my install)

juju4 commented 8 years ago

Just to confirm, working good with kitchen-sync.