WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
GNU General Public License v3.0
8.56k stars 1.3k forks source link

Crash after first Epoch #422

Open OnrKylr opened 2 months ago

OnrKylr commented 2 months ago

My training process crashed after first epoch with models/segment/gelan-c-seg.yaml. Before gelan I tried yolov9-c-seg.yaml but its caused other problems. Error is: Traceback (most recent call last): File "C:\Users\MSI\Desktop\YOLOv9-Seg\yolov9\segment\train.py", line 646, in main(opt) File "C:\Users\MSI\Desktop\YOLOv9-Seg\yolov9\segment\train.py", line 542, in main train(opt.hyp, opt, device, callbacks) File "C:\Users\MSI\Desktop\YOLOv9-Seg\yolov9\segment\train.py", line 346, in train results, maps, _ = validate.run(data_dict, ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MSI\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MSI\Desktop\yolov9-seg\yolov9\segment\val.py", line 252, in run preds = non_max_suppression(preds, ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MSI\Desktop\yolov9-seg\yolov9\utils\general.py", line 976, in non_max_suppression i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MSI\AppData\Local\Programs\Python\Python312\Lib\site-packages\torchvision\ops\boxes.py", line 41, in nms return torch.ops.torchvision.nms(boxes, scores, iouthreshold) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MSI\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch_ops.py", line 854, in call return self._op(args, **(kwargs or {})) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was ouring the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named,te, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, TraccastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMoispatch, PythonDispatcher].

CPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\cpu\nms_kernel.cpp:112 [kernel] Meta: registered at /dev/null:467 [kernel] QuantizedCPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\quantized\cpu\qnms_kernel.cpp:124 [kernel] BackendSelect: fallthrough registered at ..\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:154 [backend fallback] FuncTorchDynamicLayerBackMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:497 [backend fallback] Functionalize: registered at ..\aten\src\ATen\FunctionalizeFallbackKernel.cpp:324 [backend fallback] Named: registered at ..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ..\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback] Negative: registered at ..\aten\src\ATen\native\NegateFallback.cpp:18 [backend fallback] ZeroTensor: registered at ..\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback] AutogradOther: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:53 [backend fallback] AutogradCPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:57 [backend fallback] AutogradCUDA: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:65 [backend fallback] AutogradXLA: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:69 [backend fallback] AutogradMPS: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:77 [backend fallback] AutogradXPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:61 [backend fallback] AutogradHPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:90 [backend fallback] AutogradLazy: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:73 [backend fallback] AutogradMeta: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:81 [backend fallback] Tracer: registered at ..\torch\csrc\autograd\TraceTypeManual.cpp:297 [backend fallback] AutocastCPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\autocast\nms_kernel.cpp:34 [kernel] AutocastCUDA: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\autocast\nms_kernel.cpp:27 [kernel] FuncTorchBatched: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:731 [backend fallback] BatchedNestedTensor: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:758 [backend fallback] FuncTorchVmapMode: fallthrough registered at ..\aten\src\ATen\functorch\VmapModeRegistrations.cpp:27 [backend fallback] Batched: registered at ..\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback] VmapMode: fallthrough registered at ..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at ..\aten\src\ATen\functorch\TensorWrapper.cpp:202 [backend fallback] PythonTLSSnapshot: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:162 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:493 [backend fallback] PreDispatch: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:166 [backend fallback] PythonDispatcher: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:158 [backend fallback]

cliff101 commented 2 months ago

Looks like you did not install your pytorch environment properlly. Try to reinstall your environment. Check the requirement.txt and this discuss.

OnrKylr commented 2 months ago

"Successfully installed torch-2.3.0+cu121 torchaudio-2.3.0+cu121" I did but still having same issue