AcademySoftwareFoundation / OpenCue

A render management system you can deploy for visual effects and animation productions.
https://www.opencue.io
Apache License 2.0
816 stars 193 forks source link

sandbox quick start guide fails to submit jobs #915

Open IdrisMiles opened 3 years ago

IdrisMiles commented 3 years ago

Describe the bug

Following the linux quickstart guide for spinning up a deployment with docker-compose fails

To Reproduce Steps to reproduce the behavior:

  1. Follow guide and spin up sandbox deployment with docker-compse
  2. Follow guide and install python modules into virtual env
  3. Submit a job with APIusing following script
    
    #!/usr/bin/env python
    from outline import Outline
    from outline.cuerun import OutlineLauncher
    from outline.modules.shell import Shell
    from outline.depend import DependType

layer1 = Shell('layer1', command=['sleep 1'], range='1001-1100', threads=1, threadable=True) layer2 = Shell('layer2', command=['echo $CUE_IFRAME'], range='1001-1100', threads=1, threadable=True) layer2.depend_on(on_layer=layer1, depend_type=DependType.FrameByFrame)

ol = Outline(name='testing', name_unique=True) ol.add_layer(layer1) ol.add_layer(layer2)

launcher = OutlineLauncher(ol)

jobs = launcher.launch(False) print(jobs)


Get the following error
jobs = launcher.launch(False)

File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/cuerun.py", line 219, in launch return self.__get_backend_module().launch(self, use_pycuerun=use_pycuerun) File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/outline/backend/cue.py", line 124, in launch jobs = opencue.api.launchSpecAndWait(launcher.serialize(use_pycuerun=use_pycuerun)) File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 57, in _decorator exception(exception.failMsg.format(details=details))) File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/util.py", line 44, in _decorator return grpcFunc(*args, **kwargs) File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/opencue/api.py", line 378, in launchSpecAndWait job_pb2.JobLaunchSpecAndWaitRequest(spec=spec), timeout=Cuebot.Timeout).jobs File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 533, in call return _end_unary_response_blocking(state, call, False, None) File "/home/idris/projects/opensource/ASWF/OpenCue/venv_py2/local/lib/python2.7/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking raise _Rendezvous(state, None, None, deadline) opencue.exception.CueInternalErrorException: Server caught an internal exception. Failed to launch and add job: Failed to parse job spec XML, java.io.FileNotFoundException: http://localhost:8080/spcue/dtd/cjsl-1.11.dtd


**Expected behavior**

Job should submit without error

**Additional context**

@larsbijl comment [here](https://github.com/AcademySoftwareFoundation/OpenCue/issues/896#issuecomment-772849398) highlights the cause of the inssue. The sandbox docker-compose is pulling an outdated cuebot image from `opencue/cuebot`

cuebot: image: opencue/cuebot


**Possible Solutions**

One solution is to build the docker image from source by modifying the `docker-compose.yml`:

cuebot: build: context: ./ dockerfile: ./cuebot/Dockerfile


And then doing a build before running:

```bash
docker-compose --project-directory . -f sandbox/docker-compose.yml build

This has the drawback that the build process is quite slow, and this still requires updating the docs to add the build step.

The other, more desirable, solution is to push updated docker images.

larsbijl commented 3 years ago

@bcipriano should we cut a new release since we have merged a few DB changes.

bcipriano commented 3 years ago

Yeah, the issue here is that Docker images are only pushed to Docker Hub on release, while Docker compose runs from master.

Quick fix is as you said, we can do a new release to push new images. I'll work on this ASAP.

But ultimately this will keep happening, as master will always lead the release, so we will need a better long term solution. A couple of possibilities:

  1. Build the images directly from master as Idris mentioned, though this will be slow.
  2. Change our Github pipelines to publish Docker images on every commit to master. This should be fine for Docker compose but could cause issues in other places as the latest tag on Docker Hub will no longer point to the latest release, but rather the latest commit to master. Maybe this is ok? I would have to think through this some more.
  3. Change the Docker compose setup or instructions to use a specific tag from the repo -- basically you will need to check out a specific release locally to ensure it matches the released version on Github.

(2) sounds like the best option to me, any thoughts?

IdrisMiles commented 3 years ago

I think the main time we will run into issues with this is when there have been non backwards compatible changes between cuebot and the client packages. In the case above:

When these sorts of changes are merged we tend to incrememnt the minor version. So perhaps we can configure the Github pipeline to publish docker images when that changes?

romainf-ubi commented 1 year ago

I have a similar issue. The difference is that I run the Docker Hub images (I simply run docker-compose up -d at the repo root, which spawn db, cuebot and rqd services based on Docker Hub images), and I use the tools from my cloned repo (so I install them with python setup.py install etc.).

I can create jobs using cuesubmit and I can see that they are inserted in the DB, but they are pending forever:

image

Would you know from where could come the bug? Why the rqd service don't see the pending jobs?