YingdaXia / SynthCP

Offical code base for the ECCV oral paper "Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation"
MIT License
61 stars 9 forks source link

Problems in trianing the SPADE model #4

Closed greenairy closed 3 years ago

greenairy commented 3 years ago

Hi @YingdaXia ,

When I run the run.sh in spade-caos/ as instructed, there are many warnings:

2020-11-24 20:51:00.688985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-11-24 20:51:02.632260: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2020-11-24 20:51:02.633811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:19:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:02.635036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: pciBusID: 0000:1a:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:02.636049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: pciBusID: 0000:67:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:02.637032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties: pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:02.637077: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-11-24 20:51:02.637137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-11-24 20:51:02.637159: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-11-24 20:51:02.637179: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-11-24 20:51:02.659637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-11-24 20:51:02.659767: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-11-24 20:51:02.665321: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-11-24 20:51:02.672291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2, 3 2020-11-24 20:51:02.672906: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-11-24 20:51:02.709079: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3299990000 Hz 2020-11-24 20:51:02.709797: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e20b8a8730 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-11-24 20:51:02.709825: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-11-24 20:51:03.423709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:19:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:03.424783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: pciBusID: 0000:1a:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:03.425832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: pciBusID: 0000:67:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:03.426835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties: pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s 2020-11-24 20:51:03.426882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-11-24 20:51:03.426897: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-11-24 20:51:03.426907: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-11-24 20:51:03.426916: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-11-24 20:51:03.426979: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-11-24 20:51:03.426990: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-11-24 20:51:03.427007: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-11-24 20:51:03.435718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2, 3 2020-11-24 20:51:13.508870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-11-24 20:51:13.508930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 1 2 3 2020-11-24 20:51:13.508956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N N N N 2020-11-24 20:51:13.508977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1: N N N N 2020-11-24 20:51:13.508989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 2: N N N N 2020-11-24 20:51:13.509001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 3: N N N N 2020-11-24 20:51:13.519222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9059 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:19:00.0, compute capability: 7.5) 2020-11-24 20:51:13.526309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8528 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:1a:00.0, compute capability: 7.5) 2020-11-24 20:51:13.539344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 9477 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:67:00.0, compute capability: 7.5) 2020-11-24 20:51:13.547200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 9096 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:68:00.0, compute capability: 7.5) 2020-11-24 20:51:13.581053: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e28a16cbe0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-11-24 20:51:13.581112: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-11-24 20:51:13.581122: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-11-24 20:51:13.581129: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-11-24 20:51:13.581136: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5

Have you encountered this before when using the tensorflow-gpu 2.3.0?

YingdaXia commented 3 years ago

Hello @nuanv ,

Tensorflow is only used for visualization purposes (tensorboard). Tensorflow 1.x CPU version should be fine.

Thanks,

greenairy commented 3 years ago

Hello @nuanv ,

Tensorflow is only used for visualization purposes (tensorboard). Tensorflow 1.x CPU version should be fine.

Thanks,

Thanks for your reply @YingdaXia . I solved these warnings by downgrading the tensorflow to 1.15 using a python=3.7 environment. But still, I cannot visualize the tensorboard during trainnig. I guess it needs to be accessible by external IP.

YingdaXia commented 3 years ago

Hello @nuanv ,

I believe SPADE just used common Tensorboard functions. If you are using a remote server to run the code, a port mapping will do the trick. Feel free to post any error logs if there are further problems.