Does anyone encounter the following error on "CUDA_ERROR_ILLEGAL_ADDRESS" ?
I have changed multiprocessing to single process, but the same problem happened.
My GPU is GeForce RTX 2080 8GB (driver: 440.33.01), and
tensorflow: 1.12.0
cuda: 9.0
cudnn: 7.5.0
Training command is like this way:
CUDA_VISIBLE_DEVICES=0 python main.py \
--model_name=model_roerich \
--batch_size=1 \
--phase=train \
--image_size=768 \
--lr=0.0002 \
--dsr=0.8 \
--ptcd=./data/Places2/data_large \
--ptad=./data/artist/nicholas-roerich
Finally, the error message are:
tensorflow::CurrentStackTrace()
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device)
tensorflow::BaseGPUDevice::Sync()
2020-01-26 00:10:03.045956: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 0x5402c10: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
Does anyone encounter the following error on "CUDA_ERROR_ILLEGAL_ADDRESS" ? I have changed multiprocessing to single process, but the same problem happened.
My GPU is GeForce RTX 2080 8GB (driver: 440.33.01), and tensorflow: 1.12.0 cuda: 9.0 cudnn: 7.5.0
Training command is like this way: CUDA_VISIBLE_DEVICES=0 python main.py \ --model_name=model_roerich \ --batch_size=1 \ --phase=train \ --image_size=768 \ --lr=0.0002 \ --dsr=0.8 \ --ptcd=./data/Places2/data_large \ --ptad=./data/artist/nicholas-roerich
Finally, the error message are: tensorflow::CurrentStackTrace() stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext) stream_executor::StreamExecutor::SynchronizeAllActivity() tensorflow::GPUUtil::SyncAll(tensorflow::Device) tensorflow::BaseGPUDevice::Sync()
End stack trace
2020-01-26 00:10:03.045956: E tensorflow/stream_executor/event.cc:34] error destroying CUDA event in context 0x5402c10: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered Traceback (most recent call last): File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/username/.local/share/virtualenvs/adaptive-style-transfer-PbxNnQ9W/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: GPU sync failed