scaled yolov4-large_p5 cannot be trained on colab it shows error

mrlmnq commented 3 years ago

am trying to train the model on scaledyolo large p5 on colab pro

!python train.py --batch-size 16 --img 896 --epoch 50 --data '../data.yaml' --cfg './models/yolov4-p5.yaml' --weights '' --name --adam --hyp './data/hyp.scratch.yaml'

it shows this error /content/gdrive/MyDrive/yolo4/ScaledYOLOv4-yolov4-large/train.py /content/gdrive/MyDrive/yolo4/ScaledYOLOv4-yolov4-large Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Namespace(adam=True, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov4-p5.yaml', data='../data.yaml', device='', epochs=50, evolve=False, global_rank=-1, hyp='./data/hyp.scratch.yaml', img_size=[896, 896], local_rank=-1, logdir='runs/', multi_scale=False, name='yolop5', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='', world_size=1) Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/ 2021-05-19 06:14:01.484195: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Hyperparameters {'lr0': 0.001, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0} Overriding ./models/yolov4-p5.yaml nc=80 with nc=1

             from  n    params  module                                  arguments

0 -1 1 928 models.common.Conv [3, 32, 3, 1]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 2614016 models.common.BottleneckCSP [256, 256, 15]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 10438144 models.common.BottleneckCSP [512, 512, 15]
9 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
10 -1 1 20728832 models.common.BottleneckCSP [1024, 1024, 7]
11 -1 1 7610368 models.common.SPPCSP [1024, 512, 1]
12 -1 1 131584 models.common.Conv [512, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 8 1 131584 models.common.Conv [512, 256, 1, 1]
15 [-1, -2] 1 0 models.common.Concat [1]
16 -1 1 2298880 models.common.BottleneckCSP2 [512, 256, 3]
17 -1 1 33024 models.common.Conv [256, 128, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 6 1 33024 models.common.Conv [256, 128, 1, 1]
20 [-1, -2] 1 0 models.common.Concat [1]
21 -1 1 576000 models.common.BottleneckCSP2 [256, 128, 3]
22 -1 1 295424 models.common.Conv [128, 256, 3, 1]
23 -2 1 295424 models.common.Conv [128, 256, 3, 2]
24 [-1, 16] 1 0 models.common.Concat [1]
25 -1 1 2298880 models.common.BottleneckCSP2 [512, 256, 3]
26 -1 1 1180672 models.common.Conv [256, 512, 3, 1]
27 -2 1 1180672 models.common.Conv [256, 512, 3, 2]
28 [-1, 11] 1 0 models.common.Concat [1]
29 -1 1 9185280 models.common.BottleneckCSP2 [1024, 512, 3]
30 -1 1 4720640 models.common.Conv [512, 1024, 3, 1]
31 [22, 26, 30] 1 43080 models.yolo.Detect [1, [[13, 17, 31, 25, 24, 51, 61, 45], [48, 102, 119, 96, 97, 189, 217, 184], [171, 384, 324, 451, 616, 618, 800, 800]], [256, 512, 1024]] Traceback (most recent call last): File "train.py", line 443, in train(hyp, opt, device, tb_writer) File "train.py", line 72, in train model = Model(opt.cfg, ch=3, nc=nc).to(device)# create File "/content/gdrive/MyDrive/yolo4/ScaledYOLOv4-yolov4-large/models/yolo.py", line 83, in init self._initialize_biases() # only run once File "/content/gdrive/MyDrive/yolo4/ScaledYOLOv4-yolov4-large/models/yolo.py", line 141, in _initialize_biases b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

WongKinYiu commented 3 years ago

https://github.com/WongKinYiu/ScaledYOLOv4/issues/191#issuecomment-794515711

mrlmnq commented 3 years ago

ive checked in other comments it said i should use torch version 1.8.1 for colab pro

then i got this error @WongKinYiu

Traceback (most recent call last): File "/content/gdrive/MyDrive/CustomYOLO/ScaledYOLOv4/train.py", line 20, in import test # import test.py to get mAP after each epoch File "/content/gdrive/MyDrive/CustomYOLO/ScaledYOLOv4/test.py", line 13, in from models.experimental import attempt_load File "/content/gdrive/MyDrive/CustomYOLO/ScaledYOLOv4/models/experimental.py", line 7, in from models.common import Conv, DWConv File "/content/gdrive/MyDrive/CustomYOLO/ScaledYOLOv4/models/common.py", line 7, in from mish_cuda import MishCuda as Mish File "/usr/local/lib/python3.7/dist-packages/mish_cuda-0.0.3-py3.7-linux-x86_64.egg/mish_cuda/init.py", line 4, in from ._C import mish_forward, mish_backward ImportError: /usr/local/lib/python3.7/dist-packages/mish_cuda-0.0.3-py3.7-linux-x86_64.egg/mish_cuda/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol:

WongKinYiu commented 3 years ago

if you change pytorch version, you should re-install mish-cuda.

mrlmnq commented 3 years ago

thanks you

WongKinYiu / ScaledYOLOv4

scaled yolov4-large_p5 cannot be trained on colab it shows error #243