Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4
Apache License 2.0
4.46k stars 1.49k forks source link

when I train own data, I met this problem above all.. #179

Open jackft2 opened 3 years ago

jackft2 commented 3 years ago

og file path:log/log_2020-07-22_14-27-50.txt {'use_darknet_cfg': True, 'cfgfile': '/home/ubuntu/workspace/pytorch-YOLOv4-master/cfg/yolov4.cfg', 'batch': 64, 'subdivisions': 16, 'width': 608, 'height': 608, 'channels': 3, 'momentum': 0.949, 'decay': 0.0005, 'angle': 0, 'saturation': 1.5, 'exposure': 1.5, 'hue': 0.1, 'learning_rate': 0.001, 'burn_in': 1000, 'max_batches': 500500, 'steps': [400000, 450000], 'policy': [400000, 450000], 'scales': [0.1, 0.1], 'cutmix': 0, 'mosaic': 1, 'letter_box': 0, 'jitter': 0.2, 'classes': 80, 'track': 0, 'w': 608, 'h': 608, 'flip': 1, 'blur': 0, 'gaussian': 0, 'boxes': 60, 'TRAIN_EPOCHS': 300, 'train_label': 'train.txt', 'val_label': '/home/ubuntu/workspace/pytorch-YOLOv4-master/data/val.txt', 'TRAIN_OPTIMIZER': 'adam', 'mixup': 3, 'checkpoints': '/home/ubuntu/workspace/pytorch-YOLOv4-master/checkpoints', 'TRAIN_TENSORBOARD_DIR': '/home/ubuntu/workspace/pytorch-YOLOv4-master/log', 'iou_type': 'iou', 'keep_checkpoint_max': 10, 'load': None, 'gpu': '0', 'dataset_dir': 'data/train_data/', 'pretrained': 'yolov4.weights'}

2020-07-22 14:27:53,598 train.py[line:611] INFO: Using device cuda convalution havn't activate linear convalution havn't activate linear convalution havn't activate linear 2020-07-22 14:27:55,636 train.py[line:327] INFO: Starting training: Epochs: 300 Batch size: 64 Subdivisions: 16 Learning rate: 0.001 Training size: 1863 Validation size: 214 Checkpoints: True Device: cuda Images size: 608 Optimizer: adam Dataset classes: 80 Train label path:train.txt Pretrained:

Epoch 1/300: 0%| | 0/1863 [00:00<?, ?img/s] Traceback (most recent call last): File "train.py", line 626, in device=device, ) File "train.py", line 370, in train for i, batch in enumerate(train_loader): File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) cv2.error: Caught error in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/workspace/pytorch-YOLOv4-master/dataset.py", line 297, in getitem img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(3.4.9) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Linranran commented 3 years ago

应该是环境问题,我是重新bash,souce activate evn-name,这个问题就解决了

jackft2 commented 3 years ago

@Linranran thanks for your help. I find this problem is about dataset format. I use my own dataset for yolov3. This dataset are sparate text file and image file. In this version, it need to combine two information in train.txt file by coco_annotation.py. It's limit me to train data

ymaillet commented 3 years ago

Hi jackft2,

Your labels have the following format right ? id x_center y_center width height ... id x_center y_center width height

And each image is bound to a file like this ?

jackft2 commented 3 years ago

@ymaillet yes, i keep this format now. do you have something file to conver the dataset?

ymaillet commented 3 years ago

I had the same fomat so I implemented a little script. The only thing you have to do is rename your "train.txt" and "val.txt" to "train_old.txt" and "val_old.txt". And I didn't mention it in my last post but my old coords are in % (5 0.2 0.5 0.35 0.22), so the script needs a img size in arg to rescale it. You have to pass the file in arg too (train or val)

One thing that still bothers is the size of the img, if the coords are not in % in the .txt, it means that you have to change the file every time you change the img size for yolo.... Strange. There must be something I don't understand.

Anyways, the code :

import os 
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--img_size", help="size of the images", type=int, default=608)
parser.add_argument("--file", help="target file", type=str, default="train")

args = parser.parse_args()

IMG_SIZE = args.img_size
FILE = args.file

def switch_coordinates(tab_line):
    id_ = tab_line[0]
    x_center = float(tab_line[1])
    y_center = float(tab_line[2])
    width = float(tab_line[3])
    height = float(tab_line[4][:-1])

    x1 = str(round((x_center - width/2)*IMG_SIZE))
    y1 = str(round((y_center - height/2)*IMG_SIZE))
    x2 = str(round((x_center + width/2)*IMG_SIZE))
    y2 = str(round((y_center + height/2)*IMG_SIZE))

    return x1 + "," + y1 + "," + x2 + "," + y2 + "," + id_ + " "

with open(FILE + "_old.txt", "r") as f:
    files = []
    for line in f:

coords = []
for file_ in files:
    txt_file = file_[7:-3] +"txt"
    with open("labels/" + txt_file, 'r') as f:
        coord = ''
        for line in f:
            tab_line = line.split(" ")
            coord += switch_coordinates(tab_line)


with open(FILE + ".txt", "w") as final_file:
    for name, coord in zip(files, coords):
        line = name + " " + coord + "\n"
jackft2 commented 3 years ago

@ymaillet Thanks for your help. I will try it.

jackft2 commented 3 years ago

@ymaillet But I have the other problem, I also use this method to make val.txt file. But it's still can not work.

[ 25. 378. 63. 415. 0.]] Epoch 1/300: 92%|▉| 208/226 [01:11<00:06, 2.58imother: /home/ubuntu/workspace/pytorch-YOLOv4-master/data/train_data/img_06_425389900_00054.jpg bboxes: [[ 47. 444. 608. 489. 1.] [550. 559. 583. 596. 0.]] Epoch 1/300: 92%|▉| 209/226 [01:12<00:06, 2.75imother: /home/ubuntu/workspace/pytorch-YOLOv4-master/data/train_data/img_07_4404613900_01090.jpg bboxes: [[ 1. 266. 600. 326. 1.]] Epoch 1/300: 100%|█| 226/226 [01:17<00:00, 3.45imconvalution havn't activate linear convalution havn't activate linear convalution havn't activate linear in function convert_to_coco_api... Epoch 1/300: 100%|█| 226/226 [01:17<00:00, 2.92im Traceback (most recent call last): File "train.py", line 629, in device=device) # device=device,) File "train.py", line 428, in train evaluator = evaluate(eval_model, val_loader, config, device) File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "train.py", line 474, in evaluate coco = convert_to_coco_api(data_loader.dataset, bbox_fmt='coco') File "/home/ubuntu/workspace/pytorch-YOLOv4-master/tool/tv_reference/coco_utils.py", line 158, in convert_to_coco_api img, targets = ds[img_idx] File "/home/ubuntu/workspace/pytorch-YOLOv4-master/dataset.py", line 273, in getitem return self._get_val_item(index) File "/home/ubuntu/workspace/pytorch-YOLOv4-master/dataset.py", line 410, in _get_val_item target['image_id'] = torch.tensor([get_image_id(img_path)]) File "/home/ubuntu/workspace/pytorch-YOLOv4-master/dataset.py", line 430, in get_image_id raise NotImplementedError("Create your own 'get_image_id' function") NotImplementedError: Create your own 'get_image_id' function

ymaillet commented 3 years ago

Yes it's a function we need to implement, I opened an issue to understand what we have to do. Let's hope someone will respond !

jpf0429 commented 3 years ago

Hello, I also have this problem with VOC dataset training. How do you solve it?I use the voc_label.py in https://github.com/AlexeyAB/darknet/scripts to process datasets to generate train,txt and val.txt ,then run train.py. What's wrong with me?I hope it can be answered.Thanks

jpf0429 commented 3 years ago

og file path:log/log_2020-07-22_14-27-50.txt {'use_darknet_cfg': True, 'cfgfile': '/home/ubuntu/workspace/pytorch-YOLOv4-master/cfg/yolov4.cfg', 'batch': 64, 'subdivisions': 16, 'width': 608, 'height': 608, 'channels': 3, 'momentum': 0.949, 'decay': 0.0005, 'angle': 0, 'saturation': 1.5, 'exposure': 1.5, 'hue': 0.1, 'learning_rate': 0.001, 'burn_in': 1000, 'max_batches': 500500, 'steps': [400000, 450000], 'policy': [400000, 450000], 'scales': [0.1, 0.1], 'cutmix': 0, 'mosaic': 1, 'letter_box': 0, 'jitter': 0.2, 'classes': 80, 'track': 0, 'w': 608, 'h': 608, 'flip': 1, 'blur': 0, 'gaussian': 0, 'boxes': 60, 'TRAIN_EPOCHS': 300, 'train_label': 'train.txt', 'val_label': '/home/ubuntu/workspace/pytorch-YOLOv4-master/data/val.txt', 'TRAIN_OPTIMIZER': 'adam', 'mixup': 3, 'checkpoints': '/home/ubuntu/workspace/pytorch-YOLOv4-master/checkpoints', 'TRAIN_TENSORBOARD_DIR': '/home/ubuntu/workspace/pytorch-YOLOv4-master/log', 'iou_type': 'iou', 'keep_checkpoint_max': 10, 'load': None, 'gpu': '0', 'dataset_dir': 'data/train_data/', 'pretrained': 'yolov4.weights'}

2020-07-22 14:27:53,598 train.py[line:611] INFO: Using device cuda convalution havn't activate linear convalution havn't activate linear convalution havn't activate linear 2020-07-22 14:27:55,636 train.py[line:327] INFO: Starting training: Epochs: 300 Batch size: 64 Subdivisions: 16 Learning rate: 0.001 Training size: 1863 Validation size: 214 Checkpoints: True Device: cuda Images size: 608 Optimizer: adam Dataset classes: 80 Train label path:train.txt Pretrained:

Epoch 1/300: 0%| | 0/1863 [00:00<?, ?img/s] Traceback (most recent call last): File "train.py", line 626, in device=device, ) File "train.py", line 370, in train for i, batch in enumerate(train_loader): File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) cv2.error: Caught error in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/workspace/pytorch-YOLOv4-master/dataset.py", line 297, in getitem img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(3.4.9) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Hello, the Pascal VOC dataset I used also has this problem. Can you help me analyze the problem? thank you!

jpf0429 commented 3 years ago

When I use the coco dataset to train ,it also occurs this problem.

jackft2 commented 3 years ago

@jpf0429 the format of this is different with yours. you need to follow a new one.

aguptaneurala commented 3 years ago

@jackft2 any updates on https://github.com/Tianxiaomo/pytorch-YOLOv4/issues/179#issuecomment-663036492 ?


jackft2 commented 3 years ago

@aguptaneurala #182

here i have solved this problem

vishnuvardhan58 commented 3 years ago

Hello @jackft2 , i tried adding the line which you mentioned in #182 , but still I am getting the below error , can you please help in resolving this ?


