Open 123456hxh opened 1 year ago
RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 9508, 11376, 15364, 15632) exited unexpectedly
会不会是dataloader的num_workers设置的原因,把它设置为num_workers=0试一试?下面训练数据和验证数据中的对应地方改一下试试? https://github.com/Allenem/YOLOv3SPP/blob/2d10f858a20b6384b753a802796bd0aab4d5ccd7/train.py#L170 https://github.com/Allenem/YOLOv3SPP/blob/2d10f858a20b6384b753a802796bd0aab4d5ccd7/train.py#L172 https://github.com/Allenem/YOLOv3SPP/blob/2d10f858a20b6384b753a802796bd0aab4d5ccd7/train.py#L178 https://github.com/Allenem/YOLOv3SPP/blob/2d10f858a20b6384b753a802796bd0aab4d5ccd7/train.py#L180
还有注意用GPU跑代码,train.py是单个GPU,train_multi_GPU.py是多个GPU。
谢谢博主,能够跑通了,但是我在租用的远程GPU上面运行train.py文件时会出现:
Traceback (most recent call last):
File "train.py", line 302, in
应该是安装的cuda, pytorch和当前的GPU型号并不匹配。可以看一下知乎这里有解答:https://zhuanlan.zhihu.com/p/466793485
谢谢
请问一下,博主在运行train.py文件时,会出现下面的问题吗?是属于版本问题吗? OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. Traceback (most recent call last): File "C:\Users\hxh\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1134, in _try_get_data data = self._data_queue.get(timeout=timeout) File "C:\Users\hxh\anaconda3\envs\pytorch\lib\multiprocessing\queues.py", line 105, in get raise Empty _queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:/PycharmProjects/PycharmProjects/霹雳大佬/pytorch_object_detection/yolov3_spp/train.py", line 302, in
train(hyp)
File "D:/PycharmProjects/PycharmProjects/霹雳大佬/pytorch_object_detection/yolov3_spp/train.py", line 208, in train
scaler=scaler)
File "D:\PycharmProjects\PycharmProjects\霹雳大佬\pytorch_object_detection\yolov3_spp\train_utils\train_eval_utils.py", line 35, in train_oneepoch
for i, (imgs, targets, paths, , _) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
File "D:\PycharmProjects\PycharmProjects\霹雳大佬\pytorch_object_detection\yolov3_spp\train_utils\distributed_utils.py", line 205, in log_every
for obj in iterable:
File "C:\Users\hxh\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 652, in next
data = self._next_data()
File "C:\Users\hxh\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1330, in _next_data
idx, data = self._get_data()
File "C:\Users\hxh\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1296, in _get_data
success, data = self._try_get_data()
File "C:\Users\hxh\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1147, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 9508, 11376, 15364, 15632) exited unexpectedly