girder / girder_worker

Distributed task execution engine with Girder integration, developed by Kitware
http://girder-worker.readthedocs.io/
Apache License 2.0
34 stars 30 forks source link

docker_run can't process non-string arguments #357

Closed agirault closed 4 years ago

agirault commented 4 years ago

I'm not fluent in the Girder ecosystem, so I apologize if this isn't the best minimal example to illustrate the issue.

When levering a dataType different than the default string in Description.param, we have a non-string parameter (example: float, int) that can not directly be passed to docker_run's container_args:

from girder.api import access
from girder.api.describe import Description, autoDescribeRoute
from girder.api.rest import (
    filtermodel,
    Resource
)
from girder_jobs.models.job import Job
from girder_worker.docker.tasks import docker_run

class Test(Resource):
    def __init__(self):
        super(Test, self).__init__()
        self.resourceName = 'test'
        self.route('POST', (), self.test)

    @access.user
    @filtermodel(Job)
    @autoDescribeRoute(
        Description('Test girder worker with non-string parameters')
        .param('value', 'Some non-string data type', dataType='float', required=True) 
        .errorResponse())
    def test(self, value):
        job = docker_run.delay(
            'someContainer',
            pull_image=False,
            entrypoint='/someEntryPoint',
            container_args=[value], # does not work, requires `str(value)`
            girder_job_title='Passing ${value} through Girder Worker'
            ).job
        return Job().save(job)

Doing this leads to celery hanging and timing out:

[2020-06-12 16:00:37,183: INFO/MainProcess] Received task: girder_worker.docker.tasks.docker_run[aa94d697-0fb1-4606-a89e-6b8752aa8acb]  
[2020-06-12 16:01:37,250: ERROR/ForkPoolWorker-16] Task girder_worker.docker.tasks.docker_run[aa94d697-0fb1-4606-a89e-6b8752aa8acb] raised unexpected: ReadTimeout(ReadTimeoutError("UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)"))
Traceback (most recent call last):
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/virtualenv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/virtualenv/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/virtualenv/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/virtualenv/lib/python3.7/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/girder_worker/docker/tasks/__init__.py", line 270, in __call__
    super(DockerTask, self).__call__(*args, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/girder_worker/task.py", line 153, in __call__
    results = super(Task, self).__call__(*_t_args, **_t_kwargs)
  File "/virtualenv/lib/python3.7/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/girder_worker/docker/tasks/__init__.py", line 378, in docker_run
    remove_container, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/girder_worker/docker/tasks/__init__.py", line 331, in _docker_run
    container = _run_container(image, container_args, **run_kwargs)
  File "/virtualenv/lib/python3.7/site-packages/girder_worker/docker/tasks/__init__.py", line 64, in _run_container
    return client.containers.run(image, container_args, runtime=runtime, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/docker/models/containers.py", line 803, in run
    detach=detach, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/docker/models/containers.py", line 861, in create
    resp = self.client.api.create_container(**create_kwargs)
  File "/virtualenv/lib/python3.7/site-packages/docker/api/container.py", line 430, in create_container
    return self.create_container_from_config(config, name)
  File "/virtualenv/lib/python3.7/site-packages/docker/api/container.py", line 440, in create_container_from_config
    res = self._post_json(u, data=config, params=params)
  File "/virtualenv/lib/python3.7/site-packages/docker/api/client.py", line 289, in _post_json
    return self._post(url, data=json.dumps(data2), **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/docker/api/client.py", line 226, in _post
    return self.post(url, **self._set_request_timeout(kwargs))
  File "/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/virtualenv/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

No such issue when passing [str(value)] to container_args.

agirault commented 4 years ago

cc: @zachmullen

zachmullen commented 4 years ago

I haven't tested this myself, but IIUC, the minimal reproduction for this will simply be to run docker_run and pass container_args where one of the elements is an int or float. I don't think it would even require remote execution to trigger.

zachmullen commented 4 years ago

And the fix would be to simply run str on all the container_args before we send them along.