chef-boneyard / chef-provisioning-docker

Docker provisioner for chef-provisioning
Apache License 2.0
92 stars 41 forks source link

add read timeout #84

Closed jgoulah closed 8 years ago

jgoulah commented 8 years ago

attmpt to address issue #11

jkeiser commented 8 years ago

@jgoulah what read timeout are you setting? It feels like someone has to time out, but we should set a sane default for this so people don't have to set it. (I'll merge this, but I want to set Docker.options[:read_timeout] = <something> globally in hopes of avoiding it.)

jgoulah commented 8 years ago

@jkeiser good question, i was setting it arbitrarily large to make sure it finished. my chef run just took about 10 minutes but somehow setting it to 300 didn't cause a read timeout, so I dont think I understand all of the intricacies here. I can try to test with a few more values and see where it hits if you'd like.

jkeiser commented 8 years ago

Sure! I'll get this merged today (I'm working with another branch at the moment with some rather large changes so my tree isn't clear yet).

jkeiser commented 8 years ago

And I think rather than setting Docker.options globally we'd probably just want to add a default for :wait . It's only exec in general that ought to have this issue.

jkeiser commented 8 years ago

@jgoulah actually, I might have something even better for you: https://github.com/chef/chef-provisioning/blob/master/lib/chef/provisioning/convergence_strategy/install_cached.rb#L84 already passes a timeout for the chef converge. We just aren't respecting it ...

jkeiser commented 8 years ago

I suspect you can fix this without ever having to specify a single option if you do this:

opt[:wait] = options[:timeout] if options[:timeout]

The defaults are probably fine.

jgoulah commented 8 years ago

@jkeiser that sounds good, I'll test it tonight and make the change if it works (do you know what the default is for that one?)

jkeiser commented 8 years ago

@jgoulah it's different for every command, and not all commands even pass one, I think ... but chef-client definitely does.

Here is what I've dumped in my tree (figured I'd add support for the other docker exec options while I was at it):

    def execute(command, timeout: nil, keep_stdin_open: nil, tty: nil, detached: nil, **options)
      opts = {}
      opts[:tty] = tty unless tty.nil?
      opts[:detached] = detached unless detached.nil?
      opts[:stdin] = keep_stdin_open unless keep_stdin_open.nil?
      opts[:wait] = timeout unless timeout.nil?
jgoulah commented 8 years ago

@jkeiser a few interesting findings and corrections (the slower computer I can reliably reproduce this on shows slightly different results...)

that chef_client_timeout from the converge method above defaults to 2 hrs (eg. https://github.com/chef/chef-provisioning/blob/master/lib/chef/provisioning/convergence_strategy/install_sh.rb#L22) so I don't believe it is the same timeout as the problem is not chef timing out but docker timing out

further (and unfortunately because of the above) I can still cause read timeout after setting opts[:wait] to a large number

setting Docker.options[:read_timeout] does still work and solves the issue (but interestingly I dont believe the options to this execute are making it, they are empty when I put in a debug line)

jkeiser commented 8 years ago

@jgoulah oh! so this is during image commit. I wonder why the heck it takes so long ... but I see why we need options more globally. I do wonder if we can store the modified options in @connection (it looks to me like Docker::Connection.new can but I don't want to block things any more.

I have committed a fix like that in the jk/more-options branch (which is a rewrite of most of this provisioner), and it seems like it's working--if you want to give it a whirl, please do.

jgoulah commented 8 years ago

@jkeiser happy to test it, do you mean it should work without any options? I see you are merging a read timeout of 600, that seems like it would help. Is it normal to get a merge conflict when pulling (on lib/chef/provisioning/docker_driver/driver.rb)?

jkeiser commented 8 years ago

@jgoulah I would do git checkout jk/more-options; git reset --hard origin/jk/more-options to get the right code.

jgoulah commented 8 years ago

@jkeiser makes sense, I'll run the branch today, real test is tonight on that slow computer, thanks!

jkeiser commented 8 years ago

Hmm, I think I may have figured out a way to save and share the image with chef on it. I'm absolutely hating the fact that we have to reinstall it so often.

jgoulah commented 8 years ago

@jkeiser thats fine, I am not sure if the delay is installing chef or all of the recipes its installing, I think it is both because I can exec into the container and see packages are installing (I can't find where chef logs to here, and output is not streaming, which may be part of the problem?)

jkeiser commented 8 years ago

@jgoulah I am not sure the timeout fixes anything. I'm seeing intermittent read timeouts now after 10 minutes, sometimes.

jkeiser commented 8 years ago

I think I'd probably merge my branch and we can investigate that separate, though, since it's not a new problem. Might be either docker-api or docker itself.

jgoulah commented 8 years ago

@jkeiser Docker.options[:read_timeout] definitely fixes it... but I am not seeing options I'm passing come through into execute, so I have only gotten it working when set manually in the code

jgoulah commented 8 years ago

@jkeiser cool I'd agree, though fwiw I'm getting SSL issues on your branch that I've not seen before

jkeiser commented 8 years ago

@jgoulah mind posting your cookbook and the error?

jgoulah commented 8 years ago

@jkeiser sure, it happens on the most basic machine recipe (I've commented most of it out here to show that) the error is Excon::Errors::SocketError: SSL_CTX_use_PrivateKey: key values mismatch (OpenSSL::SSL::SSLError)

verbose log: https://gist.github.com/jgoulah/878514533f83ccdc42ef

jkeiser commented 8 years ago

@jgoulah the error you are seeing is the first time it tries to talk to the Docker API. What are your Docker environment variables set to?

jkeiser commented 8 years ago

Closing this--we're talking out of band about it, and it involves a .chef/knife.rb with keys in it plus local mode together. Mind filing a new issue?