andersbll / cudarray

CUDA-based NumPy
MIT License
233 stars 61 forks source link

Initialization error when running code with a celery worker #41

Open kronok opened 8 years ago

kronok commented 8 years ago

I'm using your neural artistic style code to generate some art using Django with Celery to work through tasks.

I have CUDA enabled and everything works beautifully without Celery. As soon as I run the neural task through Celery, it spits out:

[2016-03-06 01:25:15,656: INFO/MainProcess] Received task: art.tasks.generate_art[8f039acb-e68c-4368-821f-dbf55d0b038b] [2016-03-06 01:25:19,012: ERROR/MainProcess] Task art.tasks.generate_art[8f039acb-e68c-4368-821f-dbf55d0b038b] raised unexpected: ValueError(b'initialization error',) Traceback (most recent call last): File "/venv/deep/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task R = retval = fun(_args, _kwargs) File "/venv/deep/lib/python3.4/site-packages/celery/app/trace.py", line 438, in protected_call return self.run(_args, _kwargs) File "/home/deep/art/tasks.py", line 32, in generate_art art.run() File "/home/deep/art/models.py", line 57, in run return self.generate(subject_img=self.subject.file, style_img=style_img, kwargs) File "/home/deep/art/models.py", line 97, in generate smoothness) File "/home/deep/neural/style_network.py", line 88, in init self.x.setup(x_shape) File "/venv/deep/lib/python3.4/site-packages/deeppy-0.1.dev0-py3.4.egg/deeppy/parameter.py", line 33, in setup File "/venv/deep/lib/python3.4/site-packages/deeppy-0.1.dev0-py3.4.egg/deeppy/filler.py", line 67, in array File "/venv/deep/lib/python3.4/site-packages/cudarray-0.1.dev0-py3.4-linux-x86_64.egg/cudarray/cudarray.py", line 242, in array return ndarray(np_array.shape, np_data=np_array) File "/venv/deep/lib/python3.4/site-packages/cudarray-0.1.dev0-py3.4-linux-x86_64.egg/cudarray/cudarray.py", line 36, in init** self._data = ArrayData(self.size, dtype, np_data) File "cudarray/wrap/array_data.pyx", line 16, in cudarray.wrap.array_data.ArrayData.init (./cudarray/wrap/array_data.cpp:1401) File "cudarray/wrap/cudart.pyx", line 12, in cudarray.wrap.cudart.cudaCheck (./cudarray/wrap/cudart.cpp:763) ValueError: b'initialization error'

You may be thinking "this is a Celery issue, go post there", but hang on.

I recompiled cudarray without CUDA (to use CPU), and tried it again with Celery and it works just as it should, but obviously I need CUDA enabled if I want it to finish any tasks within a month.

Looking further into this, I've noticed there's some others having some issues with Celery and CUDA/GPU things, but have found workarounds of telling it to use the GPU in some way before the task runs.

A theory on CUDA not working with Celery off the bat: http://stackoverflow.com/questions/24744755/why-am-i-getting-cumemalloc-failed-not-initialized-even-though-i-am-initializ

Here's someone who figured it out with Theano by simply telling it to use the GPU in the task itself: http://stackoverflow.com/questions/33354272/runtimeerror-when-using-theano-shared-variable-in-a-celery-celery-worker

I've tried the ol'

os.environ['LD_LIBRARY_PATH'] = '/usr/local/cuda/lib64'
os.environ['CUDARRAY_BACKEND'] = 'cuda'

in the task itself, but I get the same initialization error.

Is there another way to pass some "use the GPU plz" context to CUDA using cudarray in my task?

andersbll commented 8 years ago

Hey, great question and info! I have no idea of how a Celery worker operates. In case a worker spawns a new process and CUDArray somehow doesn't get initialized correctly, you might try initializing the CUDA runtime with cudarray.wrap.cudart.initialize(<device_id>). Implementation here.

kronok commented 8 years ago

I believe the worker does spawn a whole new process. I tried what you suggested, and it seems to give the same 'initialization error' in my original post, sadly. Giving it a bad device_id like "5" does give me the expected bad device_id error, so it's at least is responsive to what I'm doing there. What's somewhat interesting is the error is now immediate instead of taking a few seconds to think about it, then spitting out the error.

This is such a strange one. I've been fiddling with it for hours and can't seem to figure out why they won't play well together. Still researching and trying to figure it out, though.

andersbll commented 8 years ago

Hm, according to this answer on StackOverflow, you might want to set your environment variables such that CUDA doesn't get initialized (export CUDARRAY_BACKEND=NUMPY). Then, in your worker process, you should override this environment variable to use CUDA.

Let me know if you happen to find more information on the issue. I'm a bit curious about this problem.

andersbll commented 8 years ago

I just found this. If Celery creates child processes for workers, then this might be the error you experience. In that case, forcing CUDArray to use NumPy initially, and only use the CUDA backend in child processes might work. Maybe, I'm on thin ice here! :)