Closed VolkerH closed 5 years ago
this appeared after I changed a dtype to np.int
. With dtype
back to np.uint16
everything works. I assume that the standard integer
type is not supported on my GPU.
I ran into this error again. I can consistenly reproduce it for Stack_7_drp1_dendra2skl_mScarlet_drp1_test_6_fast
. Other datasets process fine, so it may be some corruption of the input data.
/home/vhil0002/Github/Lattice_Lightsheet_Deskew_Deconv/Python/process_llsm_experiment.py:207: UserWarning: Fix write_func stuff to include compression and units
warnings.warn("Fix write_func stuff to include compression and units")
/home/vhil0002/Github/Lattice_Lightsheet_Deskew_Deconv/Python/process_llsm_experiment.py:262: UserWarning: more than one PSF found. Taking first one
warnings.warn(f"more than one PSF found. Taking first one")
0%| | 0/40 [00:00<?, ?it/s]
Traceback (most recent call last):
File "batch_run.py", line 26, in <module>
ep.process_stack_subfolder(subfolder)
File "/home/vhil0002/Github/Lattice_Lightsheet_Deskew_Deconv/Python/process_llsm_experiment.py", line 289, in process_stack_subfolder
self.process_file(pathlib.Path(row.file), deskew_func, rotate_func, deconv_functions[wavelength])
File "/home/vhil0002/Github/Lattice_Lightsheet_Deskew_Deconv/Python/process_llsm_experiment.py", line 198, in process_file
deconv_rotated = rotate_func(deconv_raw)
File "/home/vhil0002/Github/Lattice_Lightsheet_Deskew_Deconv/Python/gputools_wrapper.py", line 76, in affine_transform_gputools
result = gputools.affine(data=input_data, mat=matrix, mode=mode, interpolation=interpolation)
File "/home/vhil0002/anaconda3/envs/newllsm/lib/python3.6/site-packages/gputools/transforms/transformations.py", line 83, in affine
d_im, res_g.data, mat_inv_g.data)
File "/home/vhil0002/anaconda3/envs/newllsm/lib/python3.6/site-packages/gputools/core/oclprogram.py", line 46, in run_kernel
self._kernel_dict[name](self._dev.queue,global_size, local_size,*args,**kwargs)
File "/home/vhil0002/anaconda3/envs/newllsm/lib/python3.6/site-packages/pyopencl/__init__.py", line 815, in kernel_call
return self._enqueue(self, queue, global_size, local_size, *args, **kwargs)
File "<generated code>", line 69, in enqueue_knl_affine3
pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE
Fairly sure this is happening because tensorflow grabs almost all GPU memory and leaves very little for other processes. Depending on the volume of individual stacks there may be just enough GPU memory left to perform the affine transforms using gputools, but not always. Will have to try limiting how much GPU memory tensorflow can grab: https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory
using allow_growth might be useful in order not so set a limit automatically https://www.tensorflow.org/guide/using_gpu This may then also enable several worker processes to use the GPU.
fwiw: GPU memory exhaustion errors has a snippet for setting the memory used, and the "allow_growth" option is a good way to avoid tensorflow's very greedy default GPU memory preallocation behavior.
Thanks. I had already implemented the allow_growth
fix while on the train home a couple of hours ago. Was just about to test it when I saw your comment. I wasn't aware of the issue you referenced, basically the identical problem ... will put a watch on the flowdec repo.
that fixed it, only need to integrate this nicely