Closed chenscottus closed 1 year ago
If I understand your question correctly, output should be
cls, bbox = output[:6] # len 6
keypoints = output[6:] # len 399
where bbox
includes the critical x,y + score and keypoints
is the first 399 datapoints in data_format.md
.
Thanks!
I try to run: python3 detect.py --weights yolov7_pose_whole_body_tiny_baseline.pt --source onnx_inference/img.png --save-crop --view-img
Here are errors:
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.25, device='', exist_ok=False, hide_conf=False, hide_labels=False, img_size=640, iou_thres=0.45, kpt_label=False, line_thickness=3, name='exp', nosave=False, project='runs/detect', save_bin=False, save_conf=False, save_crop=True, save_txt=False, save_txt_tidl=False, source='onnx_inference/img.png', update=False, view_img=True, weights=['yolov7_pose_whole_body_tiny_baseline.pt']) YOLOv5 � 2023-4-4 torch 1.12.0+cu116 CUDA:0 (NVIDIA GeForce RTX 3090 Ti, 24563.375MB)
Fusing layers...
/home/dev/.local/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Model Summary: 314 layers, 8862259 parameters, 0 gradients, 19.8 GFLOPS
image 1/1 /mnt/d/Workspace2021/models/yolov7_pose_whole_body/yolov7-pose-whole-body-main/onnx_inference/img.png: tensor(0.83398, device='cuda:0')
Traceback (most recent call last):
File "detect.py", line 202, in
If I run, it works: python3 detect.py --weight yolov7_pose_whole_body_tiny_baseline.pt --kpt-label --hide-labels --hide-conf --source onnx_inference/img.png --view-img
But the image is like this:
I will try and recreate this issue later today.
Hi Scott,
So first, lowering the conf threshold improves the results alot. I think the thresh of 0.5 is way too high. Changing this to 0.1 allows for most of the keypoints to be drawn (this change has been pushed).
In terms of the mapping, there is still an issue. It probably stems from line 106 in the call to scale_coords
to scale the keypoint predictions to the original image. Also, it is important to specify the image size during inference to match the resolution during training. I've uploaded a couple inference results with varying input sizes into onnx_inference
. I have some other tasks to take care of today, but I hope the inference results should give you some insight into how to debug the scaling. If I have more time tomorrow, I can revisit this to see if I can debug the issue.
Many thanks for your efforts! -S
On Fri, Apr 7, 2023 at 9:39 PM Jack Fan @.***> wrote:
Hi Scott,
So first, lowering the conf threshold improves the results alot. I think the thresh of 0.5 is way too high. Changing this to 0.1 allows for most of the keypoints to be drawn (this change has been pushed).
In terms of the mapping, there is still an issue. It probably stems from line 106 in the call to scale_coords to scale the keypoint predictions to the original image. Also, it is important to specify the image size during inference to match the resolution during training. I've uploaded a couple inference results with varying input sizes into onnx_inference. I have some other tasks to take care of today, but I hope the inference results should give you some insight into how to debug the scaling. If I have more time tomorrow, I can revisit this to see if I can debug the issue.
— Reply to this email directly, view it on GitHub https://github.com/fan23j/yolov7-pose-whole-body/issues/2#issuecomment-1500789157, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHZOKZL7RELK6JBWTKMGJ3XADTYLANCNFSM6AAAAAAWV4TMYA . You are receiving this because you authored the thread.Message ID: @.***>
have u tried the plot_images
function in test.py
? It seems to perform the mapping correctly.
Hi,
There are 405 output per line. From here https://github.com/fan23j/yolov7-pose-whole-body/blob/main/data/data_format.md, it shows from COCO-WholeBody Annotation File Format: // 405 // // 133 x 3 = 399 // "keypoints": list([x, y, v] 17), // "foot_kpts" : list([x, y, v] 6), // "face_kpts" : list([x, y, v] 68), // "lefthand_kpts" : list([x, y, v] 21), // "righthand_kpts" : list([x, y, v] * 21), // // //
// "score" : float, //400 // "foot_score" : float, //401 // "face_score" : float, //402 // "lefthand_score" : float, //403 // "righthand_score" : float, //404 // "wholebody_score" : float, //405
But when I try to decode the output like that, it won't work.
If I decode the output like yolov7-w6-pose (I have been using it from last year), for the first 57 data or the first 5 data (step_width_coco_pose = 57;//num_class + 5 + 2 num_lines_coco_pose + 2 kp_face_pose), it won't work.
Could you please show me how to decode the output?
Many thanks!
-Scott