fan23j / yolov7-pose-whole-body

Yolov7-pose with variable keypoint support. Trained models with COCO Wholebody.
Other
10 stars 1 forks source link

Training errors #3

Closed haixiat closed 1 year ago

haixiat commented 1 year ago

Hi Jack,

python3 train.py --data data/coco_kpts.yaml --cfg cfg/yolov7-tiny-pose.yaml --batch-size 8 --img 256 --kpt-label --sync-bn --device 0 --hyp data/hyp.pose.yaml --nkpt 133 --weights yolov7-tiny-baseline.pt --epochs 500

I tried to use the command above to train a model, but got the following errors. Any comment would be appreciated.

 Epoch   gpu_mem       box       obj       cls       kpt      kptv     total    labels  img_size

0%| | 0/7075 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 564, in train(hyp, opt, device, tbwriter) File "train.py", line 290, in train for i, (imgs, targets, paths, ) in pbar: # batch ------------------------------------------------------------- File "/home/haixiatan/.local/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/haixiatan/yolov7-pose-whole-body/utils/datasets.py", line 108, in iter yield next(self.iterator) File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/haixiatan/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/haixiatan/yolov7-pose-whole-body/utils/datasets.py", line 578, in getitem img, labels = load_mosaic(self, index) File "/home/haixiatan/yolov7-pose-whole-body/utils/datasets.py", line 791, in load_mosaic img4, labels4 = random_perspective(img4, labels4, segments4, File "/home/haixiatan/yolov7-pose-whole-body/utils/datasets.py", line 1012, in random_perspective xy_kpts[:, :2] = targets[:,5:].reshape(n*num_kpt, 2) ValueError: cannot reshape array of size 136 into shape (532,2)

fan23j commented 1 year ago

I will clone the repo and see if I run into the same issue later today.

fan23j commented 1 year ago

I'm pretty sure the issue stems from using the incorrect config file. You should use yolov7-tiny-pose-wb.yaml if you are using 133 keypoints.

haixiat commented 1 year ago

Thank you very much for your suggestion, Jack!

I tried python3 train.py --data data/coco_kpts.yaml --cfg cfg/yolov7-tiny-pose-wb.yaml --batch-size 8 --img 256 --kpt-label --sync-bn --device 0 --hyp data/hyp.pose.yaml --nkpt 133 --weights yolov7-tiny-baseline.pt --epochs 500 but still got errors. Attached please find my log.
log.txt

fan23j commented 1 year ago

Can you double check if your dataset is set up correctly? I believe your targets shape is incorrect. targets should be size (30, 271). Which makes targets[:,5:] size(30,266)`.

haixiat commented 1 year ago

Hi Jack,

This is my dataset setup. Could you share yours if mine doesn't look good? Thank you very much!

coco2017labels-keypoints |-- train2017.txt |-- val2017.txt |-- labels |-- train2017 |-- val2017 |-- images |-- train2017 |-- val2017

fan23j commented 1 year ago

Check each label in labels. Perhaps you can use validate_yolo_annot.ipynb in data to assist you. Should have 271 columns iirc.

haixiat commented 1 year ago

Hi Jack,

I downloaded [Keypoints Labels of MS COCO 2017], as suggested on your "Dataset preparation" section. Each label file has 56 columns.

Could you share how you set up the dataset?

Thanks, Haixia

fan23j commented 1 year ago

The keypoint labels are for original COCO 17 keypoints. You will need to download COCO-Wholebody annotations and convert them using utils/coco2yolo.py to generate yolo labels from the annotation files. The images should stay the same.

chenscottus commented 1 year ago

We could successfully train 17 keypoints pose model for both YOLOv7 official repo as well as https://github.com/fan23j/yolov7-pose-whole-body/tree/main but could not train 133 keypoints datasets even by using 133 keypoints cfg file because the architecture they have defined for 133 keypoints is exactly same that for 17 keypoints so obviously it is giving reshape error.

Have you defined the correct YOLOv7 architecture for pose for 133 keypoints in order to train that dataset successfully?

Thanks!!

-Scott

chenscottus commented 1 year ago

The training script: python3 ./yolov7-pose-whole-body/train.py --data data/coco_kpts.yaml --cfg ./yolov7-pose-whole-body/cfg/yolov7-tiny-pose-wb.yaml --weights ./yolov7-pose-whole-body/weights/yolov7-tiny-baseline.pt --batch-size 8 --kpt-label --device 0,1 --name yolov7-w6-pose --hyp data/hyp.pose.yaml --nkpt 133 --sync-bn

The error message:

MicrosoftTeams-image

fan23j commented 1 year ago

I will do a deep dive tomorrow

chenscottus commented 1 year ago

Hi Fan,

The reshape error is resolved.

But the dataloader is the problem for loading 133 keypoints annotation files. 

Here is the error:

Screenshot from 2023-04-20 10-45-27

fan23j commented 1 year ago

try now. I must have forgotten to update the dataloader checks.

fan23j commented 1 year ago

Are you still running into any issues?

chenscottus commented 1 year ago
We are able to run,but the model we trained, show the mapping are not

right, just like those pictures you provided.

On Sun, Apr 23, 2023 at 7:49 PM Jack Fan @.***> wrote:

Are you still running into any issues?

— Reply to this email directly, view it on GitHub https://github.com/fan23j/yolov7-pose-whole-body/issues/3#issuecomment-1519303831, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHZOKYZHKLPBEULYEEPIQLXCXS2PANCNFSM6AAAAAAWXBMBRU . You are receiving this because you commented.Message ID: @.***>

haixiat commented 1 year ago

Hi Jack,

In your comments on April 12, you told me to "download COCO-Wholebody annotations and convert them using utils/coco2yolo.py to generate yolo labels from the annotation files". I downloaded Coco data by running get_coco.sh and tried coco2yolo.py, but got the following error:

/yolov7-pose-whole-body/utils$ python3 coco2yolo.py 0%| | 0/5000 [00:00<?, ?it/s] Traceback (most recent call last): File "coco2yolo.py", line 63, in box, keypoints = convert((img_width, img_height), ann["bbox"], [ann["keypoints"], ann["foot_kpts"], ann["face_kpts"], ann["lefthand_kpts"], ann["righthand_kpts"]]) KeyError: 'keypoints'

Any suggestions? Thank you!

fan23j commented 1 year ago

did u pass in the --json_path flag? You need to provide --json_path, --save_path, and --split.

haixiat commented 1 year ago

Do my settings look good? How should I provide --split ?

这里根据自己的json文件位置,换成自己的就行

parser.add_argument('--json_path', default='/home/haixia/yolov7-pose-whole-body/data/coco_kpts/annotations/instances_val2017.json',type=str, help="input: coco format(json)")

这里设置.txt文件保存位置

parser.add_argument('--save_path', default='/home/haixia/yolov7-pose-whole-body/data/coco_kpts/labels/val2017', type=str, help="specify where to save the output dir of labels") parser.add_argument('--split', default='train', type=str, help="specify train/val split")

fan23j commented 1 year ago

you need to provide the json path of the coco_wholebody annotations: https://github.com/jin-s13/COCO-WholeBody. then you would run: python3 coco2yolo.py --json_path /home/haixia/yolov7-pose-whole-body/data/coco_kpts/annotations/coco_wholebody_val_v1.0.json --save_path /home/haixia/yolov7-pose-whole-body/data/coco_kpts/labels/val2017 --split val

repeat with the train file and train split