RedHatQE / webdriver-wharf

MIT License
10 stars 6 forks source link

Cannot start container: port has already been allocated #2

Closed tehsmyers closed 9 years ago

tehsmyers commented 10 years ago
[ERROR] apscheduler.executors.default Job "balance_containers (trigger: interval[6:00:00], next run at: 2014-09-10 22:30:41 UTC)" raised an exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/apscheduler/executors/base.py", line 108, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/python2.7/site-packages/webdriver_wharf/app.py", line 289, in balance_containers
    interactions.start(container_to_start)
  File "/usr/lib/python2.7/site-packages/webdriver_wharf/interactions.py", line 98, in start
    client.start(container.id, privileged=True, port_bindings=container.port_bindings)
  File "/usr/lib/python2.7/site-packages/docker/client.py", line 818, in start
    self._raise_for_status(res)
  File "/usr/lib/python2.7/site-packages/docker/client.py", line 87, in _raise_for_status
    raise errors.APIError(e, response, explanation=explanation)
APIError: 500 Server Error: Internal Server Error ("Cannot start container 299c48ee3416de1bc27c9e153f52ab49a609631edd20a049a721a678dd3e1879: port has already been allocated")

This is happened when wharf believes a container has been destroyed, but docker is still in the process of tearing it down. Unfortunately it breaks the entire balance_containers run, so we probably need to guard against APIError and just have balance_containers sleep a second and continue.

tehsmyers commented 10 years ago

In addition to the above, I had a hunch was that this might have been related to this todo: https://github.com/seandst/webdriver-wharf/blob/3112022556a835dd85566c92e661b7759f5ac392/webdriver_wharf/interactions.py#L52

Both solutions seems to have failed. I tried adding a loop that waits for a container to no longer be known to docker before returning, as well as checking the availability of all ports, not just the webdriver port, before returning the next available port. Both attempts failed, though they're both reasonable additions to the code so they'll probably show up shortly.

This also only appears to happen on my workstation deployment, which maybe just needs an update.

tehsmyers commented 9 years ago

Indeed, since moving up to rhel 7.1 and picking up the version of docker from that release, this issue has not reappeared.