WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.3k stars 4.2k forks source link

RuntimeError: DataLoader worker (pid(s) 16144) exited unexpectedly #189

Open StandGreat opened 2 years ago

StandGreat commented 2 years ago

num_workers == 1 batch_size ==2

`D:\data\xxx\Miniconda\envs\yolov5\python.exe D:/MAC/Python/yolov7-main/train.py True YOLOR fatal: not a git repository (or any of the parent directories): .git torch 1.11.0 CUDA:0 (NVIDIA GeForce MX450, 2047.75MB)

Namespace(adam=False, artifact_alias='latest', batch_size=2, bbox_interval=-1, bucket='', cache_images=False, cfg='', data='data/kale.yaml', device='', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs\train\exp', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=2, upload_dataset=False, weights='yolov7.pt', workers=1, world_size=1) tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.2, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.0, paste_in=0.15 wandb: Install Weights & Biases for YOLOR logging with 'pip install wandb' (recommended) Overriding model.yaml nc=80 with nc=2

             from  n    params  module                                  arguments                     

0 -1 1 928 models.common.Conv [3, 32, 3, 1]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 36992 models.common.Conv [64, 64, 3, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 8320 models.common.Conv [128, 64, 1, 1]
5 -2 1 8320 models.common.Conv [128, 64, 1, 1]
6 -1 1 36992 models.common.Conv [64, 64, 3, 1]
7 -1 1 36992 models.common.Conv [64, 64, 3, 1]
8 -1 1 36992 models.common.Conv [64, 64, 3, 1]
9 -1 1 36992 models.common.Conv [64, 64, 3, 1]
10 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
11 -1 1 66048 models.common.Conv [256, 256, 1, 1]
12 -1 1 0 models.common.MP []
13 -1 1 33024 models.common.Conv [256, 128, 1, 1]
14 -3 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 147712 models.common.Conv [128, 128, 3, 2]
16 [-1, -3] 1 0 models.common.Concat [1]
17 -1 1 33024 models.common.Conv [256, 128, 1, 1]
18 -2 1 33024 models.common.Conv [256, 128, 1, 1]
19 -1 1 147712 models.common.Conv [128, 128, 3, 1]
20 -1 1 147712 models.common.Conv [128, 128, 3, 1]
21 -1 1 147712 models.common.Conv [128, 128, 3, 1]
22 -1 1 147712 models.common.Conv [128, 128, 3, 1]
23 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
24 -1 1 263168 models.common.Conv [512, 512, 1, 1]
25 -1 1 0 models.common.MP []
26 -1 1 131584 models.common.Conv [512, 256, 1, 1]
27 -3 1 131584 models.common.Conv [512, 256, 1, 1]
28 -1 1 590336 models.common.Conv [256, 256, 3, 2]
29 [-1, -3] 1 0 models.common.Concat [1]
30 -1 1 131584 models.common.Conv [512, 256, 1, 1]
31 -2 1 131584 models.common.Conv [512, 256, 1, 1]
32 -1 1 590336 models.common.Conv [256, 256, 3, 1]
33 -1 1 590336 models.common.Conv [256, 256, 3, 1]
34 -1 1 590336 models.common.Conv [256, 256, 3, 1]
35 -1 1 590336 models.common.Conv [256, 256, 3, 1]
36 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
37 -1 1 1050624 models.common.Conv [1024, 1024, 1, 1]
38 -1 1 0 models.common.MP []
39 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
40 -3 1 525312 models.common.Conv [1024, 512, 1, 1]
41 -1 1 2360320 models.common.Conv [512, 512, 3, 2]
42 [-1, -3] 1 0 models.common.Concat [1]
43 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
44 -2 1 262656 models.common.Conv [1024, 256, 1, 1]
45 -1 1 590336 models.common.Conv [256, 256, 3, 1]
46 -1 1 590336 models.common.Conv [256, 256, 3, 1]
47 -1 1 590336 models.common.Conv [256, 256, 3, 1]
48 -1 1 590336 models.common.Conv [256, 256, 3, 1]
49 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
50 -1 1 1050624 models.common.Conv [1024, 1024, 1, 1]
51 -1 1 7609344 models.common.SPPCSPC [1024, 512, 1]
52 -1 1 131584 models.common.Conv [512, 256, 1, 1]
53 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
54 37 1 262656 models.common.Conv [1024, 256, 1, 1]
55 [-1, -2] 1 0 models.common.Concat [1]
56 -1 1 131584 models.common.Conv [512, 256, 1, 1]
57 -2 1 131584 models.common.Conv [512, 256, 1, 1]
58 -1 1 295168 models.common.Conv [256, 128, 3, 1]
59 -1 1 147712 models.common.Conv [128, 128, 3, 1]
60 -1 1 147712 models.common.Conv [128, 128, 3, 1]
61 -1 1 147712 models.common.Conv [128, 128, 3, 1]
62[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
63 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
64 -1 1 33024 models.common.Conv [256, 128, 1, 1]
65 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
66 24 1 65792 models.common.Conv [512, 128, 1, 1]
67 [-1, -2] 1 0 models.common.Concat [1]
68 -1 1 33024 models.common.Conv [256, 128, 1, 1]
69 -2 1 33024 models.common.Conv [256, 128, 1, 1]
70 -1 1 73856 models.common.Conv [128, 64, 3, 1]
71 -1 1 36992 models.common.Conv [64, 64, 3, 1]
72 -1 1 36992 models.common.Conv [64, 64, 3, 1]
73 -1 1 36992 models.common.Conv [64, 64, 3, 1]
74[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
75 -1 1 65792 models.common.Conv [512, 128, 1, 1]
76 -1 1 0 models.common.MP []
77 -1 1 16640 models.common.Conv [128, 128, 1, 1]
78 -3 1 16640 models.common.Conv [128, 128, 1, 1]
79 -1 1 147712 models.common.Conv [128, 128, 3, 2]
80 [-1, -3, 63] 1 0 models.common.Concat [1]
81 -1 1 131584 models.common.Conv [512, 256, 1, 1]
82 -2 1 131584 models.common.Conv [512, 256, 1, 1]
83 -1 1 295168 models.common.Conv [256, 128, 3, 1]
84 -1 1 147712 models.common.Conv [128, 128, 3, 1]
85 -1 1 147712 models.common.Conv [128, 128, 3, 1]
86 -1 1 147712 models.common.Conv [128, 128, 3, 1]
87[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
88 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
89 -1 1 0 models.common.MP []
90 -1 1 66048 models.common.Conv [256, 256, 1, 1]
91 -3 1 66048 models.common.Conv [256, 256, 1, 1]
92 -1 1 590336 models.common.Conv [256, 256, 3, 2]
93 [-1, -3, 51] 1 0 models.common.Concat [1]
94 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
95 -2 1 525312 models.common.Conv [1024, 512, 1, 1]
96 -1 1 1180160 models.common.Conv [512, 256, 3, 1]
97 -1 1 590336 models.common.Conv [256, 256, 3, 1]
98 -1 1 590336 models.common.Conv [256, 256, 3, 1]
99 -1 1 590336 models.common.Conv [256, 256, 3, 1]
100[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
101 -1 1 1049600 models.common.Conv [2048, 512, 1, 1]
102 75 1 328704 models.common.RepConv [128, 256, 3, 1]
103 88 1 1312768 models.common.RepConv [256, 512, 3, 1]
104 101 1 5246976 models.common.RepConv [512, 1024, 3, 1]
105 [102, 103, 104] 1 37695 models.yolo.Detect [2, [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]], [256, 512, 1024]] D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Model Summary: 407 layers, 37200095 parameters, 37200095 gradients, 105.1 GFLOPS

Transferred 554/560 items from yolov7.pt Scaled weight_decay = 0.0005 Optimizer groups: 95 .bias, 95 conv.weight, 92 other train: Scanning 'D:\MAC\Python\yolov7-main\none-background\labels\train.cache' images and labels... 189 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 189/189 [00:00<?, ?it/s] True val: Scanning 'D:\MAC\Python\yolov7-main\none-background\labels\val.cache' images and labels... 24 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 24/24 [00:00<?, ?it/s] OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

autoanchor: Analyzing anchors... anchors/target = 3.46, Best Possible Recall (BPR) = 1.0000 Image sizes 640 train, 640 test Using 1 dataloader workers Logging results to runs\train\exp Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total    labels  img_size

0%| | 0/95 [00:00<?, ?it/s]True 0%| | 0/95 [00:05<?, ?it/s] Traceback (most recent call last): File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\utils\data\dataloader.py", line 1011, in _try_get_data data = self._data_queue.get(timeout=timeout) File "D:\data\xxx\Miniconda\envs\yolov5\lib\queue.py", line 178, in get raise Empty _queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:/MAC/Python/yolov7-main/train.py", line 610, in train(hyp, opt, device, tbwriter) File "D:/MAC/Python/yolov7-main/train.py", line 337, in train for i, (imgs, targets, paths, ) in pbar: # batch ------------------------------------------------------------- File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\tqdm\std.py", line 1195, in iter for obj in iterable: File "D:\MAC\Python\yolov7-main\utils\datasets.py", line 110, in iter yield next(self.iterator) File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next data = self._next_data() File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\utils\data\dataloader.py", line 1207, in _next_data idx, data = self._get_data() File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\utils\data\dataloader.py", line 1163, in _get_data success, data = self._try_get_data() File "D:\data\xxx\Miniconda\envs\yolov5\lib\site-packages\torch\utils\data\dataloader.py", line 1024, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 16144) exited unexpectedly

`

pratayusha commented 1 year ago

Search for libiomp5md.dll in Anaconda installation folder -> Check if more than one libiomp5md.dll file are present at different locations ->I had two libiomp5md.dll present at my current virtual environment,

Delete libiomp5md.dll without torch installed.

Intel(R) OMP Runtime library is responsible for modifying execution environment at runtime and should be present in the environment where torch is installed, like C:\ProgramData\Anaconda3\envs\my_yolov7\Lib\site-packages\torch\lib in my case.

Lee-2000 commented 11 months ago

have u solved this?