Closed csy530216 closed 6 years ago
@csy530216 can you report the docker logs for the attack container? running model and attack on the same GPU might also be a problem
@bveliqi can yo look into this when you are back
@jonasrauber Yes you are right! When running model and attack on two different GPUs the problem disappears. When running them on the same GPU, the result of docker logs avc_test_attack_submission
:
Running attack... Wating for model server to start... 2018-08-22 09:43:09.767651: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-08-22 09:43:10.302564: I tensorflow/core/common_runtime/gpu/gpudevice.cc:1356] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076 pciBusID: 0000:03:00.0 totalMemory: 11.93GiB freeMemory: 483.81MiB_ 2018-08-22 09:43:10.302612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-22 09:43:10.768711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-22 09:43:10.768774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-08-22 09:43:10.768786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-08-22 09:43:10.769619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 202 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0, compute capability: 5.2) INFO:tensorflow:Restoring parameters from /home/shuyu/resnet18/checkpoints/model/model.ckpt-5865 Restoring parameters from /home/shuyu/resnet18/checkpoints/model/model.ckpt-5865 2018-08-22 09:43:18.004022: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2018-08-22 09:43:18.004092: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo
(), &algorithms) /prd/run.sh: line 2: 7 Aborted (core dumped) python3 -u ./main.py
Maybe the reason is that since model container runs first, it consumes nearly all the GPU memory, which causes the error occurred in attack container.
The reason why when using the same GPU for model and attack, transfer_untargeted_attack_baseline fails while boundary_untargeted_attack_baseline does not is that transfer attack runs a TensorFlow model which costs GPU memory. Hence if we want to utilize a single GPU card to run avc-test-model-against-attack
, we should limit the GPU memory used by the model.
Modifying fmodel.py
in resnet18_model_baseline
:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
sess = tf.Session(graph=graph, config=tf.ConfigProto(gpu_options=gpu_options))
solves the problem.
That might be a workaround, but we recommend to use two GPUs as we do.
Hello! Although I know avc-test-model-against-attack is not published yet, after reading the code, the problem still seems strange to me: When I run
(I copied
avc-test-model-against-attack
and cloned the baselines to the working directory.)The following error message shows:
The error also occurs when using
iterative_transfer_untargeted_attack_baseline
, but does not occur usingboundary-untargeted-attack-baseline
orsaltnpepper_untargeted_attack_baseline
.