Closed skeletonNN closed 1 year ago
Such as I run the 10s video, the demo is get 20s result. But the opencv_onnx get the 10s? Why do this.
If you run a video, the input will be all the frames. Then these frames are combined into a video. The total frames should keep the same. Maybe the frame rate is different ? For the same image input, the results of onnx and opencv_onnx keep the same ?
I run the same video, in your demo for this [https://openxlab.org.cn/apps/detail/mmpose/RTMPose], input is 10s video the output is 20s video. And I run the opencv_onnx, the result is bad, but maybe the pth result is good, i will test. I want to know the url link [https://openxlab.org.cn/apps/detail/mmpose/RTMPose] do something for poseprocess? The link demo result is good.
The opencv_onnx the result. is bad.
The demo result. is perfect. and the result is 20s.
So i want to know the demo is do otherthing? Maybe the opencv_onnx i run in cpu not gpu?
How do you use opencv_onnx for inference ?
Use the controlenet script.
class Wholebody:
def __init__(self, onnx_pose, device):
backend = cv2.dnn.DNN_BACKEND_OPENCV if device == 'cpu' else cv2.dnn.DNN_BACKEND_CUDA
providers = cv2.dnn.DNN_TARGET_CPU if device == 'cpu' else cv2.dnn.DNN_TARGET_CUDA
self.session_pose = cv2.dnn.readNetFromONNX(onnx_pose)
self.session_pose.setPreferableBackend(backend)
self.session_pose.setPreferableTarget(providers)
def __call__(self, oriImg, det_result):
keypoints, scores = inference_pose(self.session_pose, det_result, oriImg)
keypoints_info = np.concatenate(
(keypoints, scores[..., None]), axis=-1)
return keypoints_info
class DWposeDetector:
def __init__(self, onnx_pose, device):
self.pose_estimation = Wholebody(onnx_pose, device)
def __call__(self, oriImg, det_results):
oriImg = oriImg.copy()
with torch.no_grad():
candidate, subset = self.pose_estimation(oriImg, det_results)
return candidate, subset
class HandPoseModel(object):
def __init__(self, device):
self.dwprocessor = DWposeDetector(ckpt_path)
def predict_hand_pose(self, image, det_results):
keypoints, score = self.dwprocessor(image, det_results)
score = np.expand_dims(score, axis=2)
dwpose_results = np.concatenate((keypoints, score), axis=2)
return dwpose_results
if __name__ == "__main__":
cpm = HandPoseModel()
detector = Detection()
cap = cv2.VideoCapture("./test.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
for frame_id in range(num_frames):
ret, frame = cap.read()
pred_bbox = detector(frame)
joint2d = cpm.predict_hand_pose(frame, pred_bbox)
You need to detect and crop the person with yolox first.
I use the vitdet for det.
class Detection(object):
def __init__(self):
print("Loading Detection model...")
from detectron2.config import LazyConfig
cfg_path ='configs'/'cascade_mask_rcnn_vitdet_h_75ep.py'
self.detectron2_cfg = LazyConfig.load(str(cfg_path))
self.detectron2_cfg.train.init_checkpoint = 'https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl'
for i in range(3):
self.detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.5
self.detector = DefaultPredictor_Lazy(self.detectron2_cfg)
def get_detections(self, image):
outputs = self.detector(image)
instances = outputs['instances']
instances = instances[instances.pred_classes==0]
instances = instances[instances.scores>0.5]
pred_bbox = instances.pred_boxes.tensor.cpu().numpy()
pred_masks = instances.pred_masks.cpu().numpy()
pred_scores = instances.scores.cpu().numpy()
pred_classes= instances.pred_classes.cpu().numpy()
return pred_bbox
It looks strange. Cause the opencv_onnx has been used by others and the results are good. You can try to use yolox_onnx with our code to debug first. I guess the problem may be the det bbox.
Hello, The onnx model is fp16 or fp32 or int?
fp32
wuwuwu, i don't know the result is different with you.
I use the yolo detect. May i get the video is wrong?
class Wholebody:
def __init__(self):
device = 'cuda:0'
providers = ['CPUExecutionProvider'
] if device == 'cpu' else ['CUDAExecutionProvider']
onnx_det = 'dwpose/yolox_l.onnx'
onnx_pose = 'dwpose/dw-ll_ucoco_384.onnx'
self.session_det = ort.InferenceSession(path_or_bytes=onnx_det, providers=providers)
self.session_pose = ort.InferenceSession(path_or_bytes=onnx_pose, providers=providers)
def __call__(self, oriImg):
det_result = inference_detector(self.session_det, oriImg)
keypoints, scores = inference_pose(self.session_pose, det_result, oriImg)
keypoints_info = np.concatenate(
(keypoints, scores[..., None]), axis=-1)
return keypoints_info
class DWposeDetector:
def __init__(self):
self.pose_estimation = Wholebody()
def __call__(self, oriImg):
oriImg = oriImg.copy()
with torch.no_grad():
keypoints_info = self.pose_estimation(oriImg)
return keypoints_info
class HandPoseModel(object):
def __init__(self):
self.dwprocessor = DWposeDetector()
def predict_hand_pose(self, image):
keypoints = self.dwprocessor(image)
return keypoints
cap = cv2.VideoCapture("./test_hand3.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
cpm = HandPoseModel()
for frame_id in range(num_frames):
ret, frame = cap.read()
joint2d = cpm.predict_hand_pose(frame)
for dwpose_out in joint2d[:1, :]:
dwpose_2d = np.zeros([67, 3])
dwpose_2d[
[0, 16, 15, 18, 17, 5, 2, 6, 3, 7, 4, 12, 9, 13, 10, 14, 11, 19, 20, 21, 22, 23, 24]
] = dwpose_out[:23]
dwpose_2d[25:] = dwpose_out[91:133]
img = vis_keypoints(dwpose_2d, (frame.shape[1], frame.shape[0]), image=frame, dataset='openpose')
cv2.imwrite(f"./box_visual/{frame_id+1}.jpg", img[:,:,:3])
OK i get the reason. wuwuwu i'm very cai.
So what's the reason ( ゚皿゚)
take the frame to frame[:,;,::-1]
Thank you for your project!
I run the demo. The result is very perfect! But when i run the opencv_onnx brach, i get the different result. Can you do something hand postprocess?