610265158 / face_landmark

A simple method for face alignment based on wingloss and mutitask learning :)
Apache License 2.0
251 stars 80 forks source link

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

Open you-old opened 4 years ago

you-old commented 4 years ago

大佬,报这个错是什么原因啊

2020-01-15 15:34:26.438882: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory 2020-01-15 15:34:26.438956: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory 2020-01-15 15:34:26.438977: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. [2020-01-15 15:34:27,223] [INFO] The trainer start 2020-01-15 15:34:27.225280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-01-15 15:34:27.230277: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2020-01-15 15:34:27.230318: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: user-PowerEdge-T640 2020-01-15 15:34:27.230328: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: user-PowerEdge-T640 2020-01-15 15:34:27.230416: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.26.0 2020-01-15 15:34:27.230448: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.26.0 2020-01-15 15:34:27.230459: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.26.0 2020-01-15 15:34:27.231251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-01-15 15:34:27.270280: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz 2020-01-15 15:34:27.277139: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56bbad0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-01-15 15:34:27.277196: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version [2020-01-15 15:34:27,287] [WARNING] Some requested devices in tf.distribute.Strategy are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0 [2020-01-15 15:34:27,290] [INFO] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',) [2020-01-15 15:34:27,603] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,604] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,721] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,723] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,735] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,736] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,748] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,749] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,914] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:27,915] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',). [2020-01-15 15:34:29,108] [INFO] [x] Get dataset from
[2020-01-15 15:34:49,291] [INFO] the datasets contains 116104 samples [2020-01-15 15:36:17,209] [INFO] befor balance the dataset contains 116104 images [2020-01-15 15:36:17,209] [INFO] after balanced the datasets contains 8150004 samples [0115 15:36:21 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d. [0115 15:36:21 @argtools.py:146] WRN Starting a process with 'fork' method is not safe and may consume unnecessary extra CPU memory. Use 'forkserver/spawn' method (available after Py3.4) instead if you run into any issues. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods [2020-01-15 15:36:21,617] [INFO] [x] Get dataset from
[2020-01-15 15:36:22,250] [INFO] the datasets contains 6111 samples [2020-01-15 15:36:26,920] [INFO] befor balance the dataset contains 6111 images [2020-01-15 15:36:26,920] [INFO] after balanced the datasets contains 428541 samples [0115 15:36:27 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d. Traceback (most recent call last): File "train.py", line 108, in main() File "train.py", line 105, in main strategy) File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 196, in custom_loop train_dist_dataset,epoch) File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 147, in distributed_train_epoch for one_batch in ds: File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 565, in iter self._input_workers) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 1011, in _create_iterators_per_worker worker_devices) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 864, in init self._make_iterator() File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 870, in _make_iterator self._dataset, self._devices) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 292, in init self._experimental_slack) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 202, in _create_device_dataset ds = ds.prefetch(prefetch_buffer_size) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1013, in prefetch return PrefetchDataset(self, buffer_size) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 4114, in init buffer_size, dtype=dtypes.int64, name="buffer_size") File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function return constant_op.constant(value, dtype, name=name) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant allow_broadcast=True) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl t = convert_to_eager_tensor(value, ctx, dtype) File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor return ops.EagerTensor(value, ctx.device_name, dtype) RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

610265158 commented 4 years ago

很多库都没找到,你都tensorflow 有问题吧,

gpu也没找到