Closed SuhelNaryal closed 3 years ago
Maybe you want some simple demo codes? Sorry but I'm busy nowadays :( I'll make some demo codes after a conference deadline (mid of November). Thanks!
Hi, I figured some things from the paper and ran the code on my image. I just wanted to ask how much difference does abs root depth make in results. I got the following results. Thanks for your help. You have done great work.
not much. abs root means absolute depth value from camera to the wrist. it may affect 3D coordinates, not on image coordinates like your result.
Okay, Thank you so much.
@SuhelNaryal Did you use any extra parameters other than RGB image while inferencing on custom image ?
@saqib22 Hi. No, I did not use the extra parameters. I modified the call function in model.py to use just the image.
@SuhelNaryal Thank You! I will try to modify the code for myself too because I want the inference results from single camera.
One more thing are these coordinates in 3D or 2D, the one you got on your custom image ?
Feel free to ask in case you need any help. Happy to help.
One more thing are these coordinates in 3D or 2D, the one you got on your custom image ?
Thanks
These are 3D coordinates.
@saqib22 Hi. No, I did not use the extra parameters. I modified the call function in model.py to use just the image.
Hi I want to use my own image dataset like you, can you please tell me the location and call function did you change in model.py. Because I want to input and output only one image
Do I need annotation (.json) as input for my own dataset for visualization?
These are 3D coordinates.
@SuhelNaryal Thanks ! So are these 3D coordinates in image space or the actual 3D space ?
@saqib22 Hi. No, I did not use the extra parameters. I modified the call function in model.py to use just the image.
Hi I want to use my own image dataset like you, can you please tell me the location and call function did you change in model.py. Because I want to input and output only one image
Hi, In main -> model.py -> class Model -> function forward you can make following chages.
def forward(self, inputs, targets=None, meta_info=None, mode=None):
input_img = inputs['img']
if mode in ['training', 'test']:
target_joint_coord, target_rel_root_depth, target_hand_type = targets['joint_coord'], targets['rel_root_depth'], targets['hand_type']
joint_valid, root_valid, hand_type_valid, inv_trans = meta_info['joint_valid'], meta_info['root_valid'], meta_info['hand_type_valid'], meta_info['inv_trans']
batch_size = input_img.shape[0]
img_feat = self.backbone_net(input_img)
joint_heatmap_out, rel_root_depth_out, hand_type = self.pose_net(img_feat)
if mode == 'train':
target_joint_heatmap = self.render_gaussian_heatmap(target_joint_coord)
loss = {}
loss['joint_heatmap'] = self.joint_heatmap_loss(joint_heatmap_out, target_joint_heatmap, joint_valid)
loss['rel_root_depth'] = self.rel_root_depth_loss(rel_root_depth_out, target_rel_root_depth, root_valid)
loss['hand_type'] = self.hand_type_loss(hand_type, target_hand_type, hand_type_valid)
return loss
elif mode == 'test':
out = {}
val_z, idx_z = torch.max(joint_heatmap_out,2)
val_zy, idx_zy = torch.max(val_z,2)
val_zyx, joint_x = torch.max(val_zy,2)
joint_x = joint_x[:,:,None]
joint_y = torch.gather(idx_zy, 2, joint_x)
joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
joint_z = torch.gather(joint_z, 2, joint_x)
joint_coord_out = torch.cat((joint_x, joint_y, joint_z),2).float()
out['joint_coord'] = joint_coord_out
out['rel_root_depth'] = rel_root_depth_out
out['hand_type'] = hand_type
out['inv_trans'] = inv_trans
out['target_joint'] = target_joint_coord
out['joint_valid'] = joint_valid
out['hand_type_valid'] = hand_type_valid
return out
else:
out = {}
val_z, idx_z = torch.max(joint_heatmap_out,2)
val_zy, idx_zy = torch.max(val_z,2)
val_zyx, joint_x = torch.max(val_zy,2)
joint_x = joint_x[:,:,None]
joint_y = torch.gather(idx_zy, 2, joint_x)
joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
joint_z = torch.gather(joint_z, 2, joint_x)
joint_coord_out = torch.cat((joint_x, joint_y, joint_z),2).float()
out['joint_coord'] = joint_coord_out
out['rel_root_depth'] = rel_root_depth_out
out['hand_type'] = hand_type
return out
The idea is to just get the outputs from model. You will get 3d joint coordinates in a 64x64x64 space. you will then have to map these coordinates into your image space.
Do I need annotation (.json) as input for my own dataset for visualization?
No, You don't need json. Just get the model, load the pretrained weights and pass the image to the model and you will get the coordinates.
These are 3D coordinates.
@SuhelNaryal Thanks ! So are these 3D coordinates in image space or the actual 3D space ?
Yes, these coordinates have been mapped to images space. You can map the 64x64x64 space output to any space you desire.
@SuhelNaryal Thank You So much for your help, I am going to try this out now !
Do I need annotation (.json) as input for my own dataset for visualization?
No, You don't need json. Just get the model, load the pretrained weights and pass the image to the model and you will get the coordinates.
@SuhelNaryal Thanks for your reply.What is the meaning about pass the image to the model,if I want to change my own img_path,where do I need to change?In dataset.py,if I only change img_path,it will be a lot of problems because of the code below will connect annot_path,annot_subset and so on.But my folder only has my own image instead of json file.
@SuhelNaryal I am done with changing the code in model.py as you suggested and wrote a custom test data loader for my custom images and now I am able to run this on my own images but now can you comment on how did you visualize this on your own image ? any snippet for that too ?
I mean how do you get the meta_info ?
@saqib22 Hi, you can try the following code.
from utils.preprocessing import load_skeleton
from utils.vis import vis_keypoints, vis_3d_keypoints
from utils.transforms import world2cam, cam2pixel, pixel2cam
focal = [1500, 1500] # x-axis, y-axis
princpt = [256/2, 256/2]
root_joint_idx = {'right': 20, 'left': 41}
skeleton = load_skeleton('path_to_skeleton.txt', 42) #skeleton.txt is in the annotations zip
joint_coord_out = out['joint_coord'].cpu().numpy()
rel_root_depth_out = out['rel_root_depth'].cpu().numpy()
hand_type_out = out['hand_type'].cpu().numpy()
preds = {'joint_coord': [], 'rel_root_depth': [], 'hand_type': []}
for i in range(joint_coord_out.shape[0]):
preds['joint_coord'].append(joint_coord_out[i])
preds['rel_root_depth'].append(rel_root_depth_out[i])
preds['hand_type'].append(hand_type_out[i])
preds = {k: np.concatenate(v) for k,v in preds.items()}
preds_joint_coord, preds_rel_root_depth, preds_hand_type = preds['joint_coord'], preds['rel_root_depth'], preds['hand_type']
pred_joint_coord_img = preds_joint_coord[0].copy()
pred_joint_coord_img[:,0] = pred_joint_coord_img[:,0]/cfg.output_hm_shape[2]*cfg.input_img_shape[1]
pred_joint_coord_img[:,1] = pred_joint_coord_img[:,1]/cfg.output_hm_shape[1]*cfg.input_img_shape[0]
pred_joint_coord_img[:,2] = (pred_joint_coord_img[:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2)
if preds_hand_type[0][0] == 0.9 and preds_hand_type[0][1] == 0.9: #change threshold to execute this parth if both handa are present
pred_rel_root_depth = (preds_rel_root_depth[0]/cfg.output_root_hm_shape * 2 - 1) * (cfg.bbox_3d_size_root/2)
pred_left_root_img = pred_joint_coord_img[root_joint_idx['left']].copy()
pred_left_root_img[2] += pred_rel_root_depth
pred_left_root_cam = pixel2cam(pred_left_root_img[None,:], focal, princpt)[0]
pred_right_root_img = pred_joint_coord_img[root_joint_idx['right']].copy()
pred_right_root_cam = pixel2cam(pred_right_root_img[None,:], focal, princpt)[0]
pred_rel_root = pred_left_root_cam - pred_right_root_cam
pred_joint_coord_cam = pixel2cam(pred_joint_coord_img, focal, princpt)
joint_type = {'right': np.arange(0,21), 'left': np.arange(21,21*2)}
for h in ('right', 'left'):
pred_joint_coord_cam[joint_type[h]] = pred_joint_coord_cam[joint_type[h]] - pred_joint_coord_cam[root_joint_idx[h],None,:]
joint_valid = [1.0]*21 + [1.0]*21 #change 1.0 to 0 if that handis not resent right hand is comes first in output
img_path = 'path to image'
cvimg = cv2.imread(img_path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
_img = cvimg[:,:,::-1].transpose(2,0,1)
vis_kps = pred_joint_coord_img.copy()
vis_valid = joint_valid.copy()
filename = 'out____2d.jpg'
vis_keypoints(img, pred_joint_coord_img, joint_valid, skeleton, filename)
filename = 'out____3d.jpg'
vis_3d_keypoints(pred_joint_coord_cam, joint_valid, skeleton, filename)
This is an example of how you can use outputs on your image. I believe it is pretty much correct. you may add root depth to the z coordinates if needed. Please let me know if something goes wrong with the code.
@SuhelNaryal I am done with changing the code in model.py as you suggested and wrote a custom test data loader for my custom images and now I am able to run this on my own images but now can you comment on how did you visualize this on your own image ? any snippet for that too ?
I mean how do you get the meta_info ?
Hi,where did you write your custom test data loader for your custom images in?Is it a separate python file or you changed somewhere in author's code?Can you please common how did you load your custom images or where did you change it. Or your custom test data loader code please.Thank you!
@kingsman0000 I have not done any of that. I modified the code to get outputs on my own image. I have not used any meta info or data loader.
@SuhelNaryal So how about the code you common just like 3 days ago?Is that a separate python file or need to add somewhere in author's code?I tried to add my own path but it still miss like "out[]" Can I use that code to predict and vis my own img?
@kingsman0000 I have not done any of that. I modified the code to get outputs on my own image. I have not used any meta info or data loader.
@kingsman0000 out is the output from the model.
@saqib22 Hi, you can try the following code.
from utils.preprocessing import load_skeleton from utils.vis import vis_keypoints, vis_3d_keypoints from utils.transforms import world2cam, cam2pixel, pixel2cam focal = [1500, 1500] # x-axis, y-axis princpt = [256/2, 256/2] root_joint_idx = {'right': 20, 'left': 41} skeleton = load_skeleton('path_to_skeleton.txt', 42) #skeleton.txt is in the annotations zip joint_coord_out = out['joint_coord'].cpu().numpy() rel_root_depth_out = out['rel_root_depth'].cpu().numpy() hand_type_out = out['hand_type'].cpu().numpy() preds = {'joint_coord': [], 'rel_root_depth': [], 'hand_type': []} for i in range(joint_coord_out.shape[0]): preds['joint_coord'].append(joint_coord_out[i]) preds['rel_root_depth'].append(rel_root_depth_out[i]) preds['hand_type'].append(hand_type_out[i]) preds = {k: np.concatenate(v) for k,v in preds.items()} preds_joint_coord, preds_rel_root_depth, preds_hand_type = preds['joint_coord'], preds['rel_root_depth'], preds['hand_type'] pred_joint_coord_img = preds_joint_coord[0].copy() pred_joint_coord_img[:,0] = pred_joint_coord_img[:,0]/cfg.output_hm_shape[2]*cfg.input_img_shape[1] pred_joint_coord_img[:,1] = pred_joint_coord_img[:,1]/cfg.output_hm_shape[1]*cfg.input_img_shape[0] pred_joint_coord_img[:,2] = (pred_joint_coord_img[:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2) if preds_hand_type[0][0] == 0.9 and preds_hand_type[0][1] == 0.9: #change threshold to execute this parth if both handa are present pred_rel_root_depth = (preds_rel_root_depth[0]/cfg.output_root_hm_shape * 2 - 1) * (cfg.bbox_3d_size_root/2) pred_left_root_img = pred_joint_coord_img[root_joint_idx['left']].copy() pred_left_root_img[2] += pred_rel_root_depth pred_left_root_cam = pixel2cam(pred_left_root_img[None,:], focal, princpt)[0] pred_right_root_img = pred_joint_coord_img[root_joint_idx['right']].copy() pred_right_root_cam = pixel2cam(pred_right_root_img[None,:], focal, princpt)[0] pred_rel_root = pred_left_root_cam - pred_right_root_cam pred_joint_coord_cam = pixel2cam(pred_joint_coord_img, focal, princpt) joint_type = {'right': np.arange(0,21), 'left': np.arange(21,21*2)} for h in ('right', 'left'): pred_joint_coord_cam[joint_type[h]] = pred_joint_coord_cam[joint_type[h]] - pred_joint_coord_cam[root_joint_idx[h],None,:] joint_valid = [1.0]*21 + [1.0]*21 #change 1.0 to 0 if that handis not resent right hand is comes first in output img_path = 'path to image' cvimg = cv2.imread(img_path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) _img = cvimg[:,:,::-1].transpose(2,0,1) vis_kps = pred_joint_coord_img.copy() vis_valid = joint_valid.copy() filename = 'out____2d.jpg' vis_keypoints(img, pred_joint_coord_img, joint_valid, skeleton, filename) filename = 'out____3d.jpg' vis_3d_keypoints(pred_joint_coord_cam, joint_valid, skeleton, filename)
This is an example of how you can use outputs on your image. I believe it is pretty much correct. you may add root depth to the z coordinates if needed. Please let me know if something goes wrong with the code.
Hello Sir, May I know where you use this code , I mean you make different file or in any file you add this code. Because I'm trying to use my own data set.Can you please help me with this. Thank you so much.
@aswa123 Hi, create a new file for this code.
@SuhelNaryal Hi, I have used the code for visualization but my results are not good
@saqib22 I having this issue as well. Do tell if you find something on this.
@SuhelNaryal Didn't you get the right results ? in the early comments ?
@saqib22 I having this issue as well. Do tell if you find something on this.
Sure Thing, Thanks
@SuhelNaryal Didn't you get the right results ? in the early comments?
Only on this particular image. I have not done enough testing yet.
@SuhelNaryal Didn't you get the right results ? in the early comments?
Only on this particular image. I have not done enough testing yet.
Then I think there must be some other parameters along with RGB image then ? Can you reopen this issue ?
@SuhelNaryal Can you pls post the code how you passed image to the model to get output? Did you create any custom dataset.py to pass your random image ? Also did you create any .json annotations to the random image you passed ??
@ravitejageeda I have shared the code above. No, I have not created any dataset.py or .json. The metadata is used for testing and training. You can just load the model and pass your image to get the results.
@SuhelNaryal Hi, I have used the code for visualization but my results are not good
@mks0601 I get the results like this when I don't use inv_trans, but after using the inv_trans matrix I get the correct results. So, how can I calculate this matrix.
I see a function in preprocessing.augmentation() that requires the box along with some other parameter. So is it possible to get this matrix during test time for my random image ? Thanks
@saqib22 @SuhelNaryal I get the following error when I run the above code.
10 skeleton = load_skeleton('/content/InterHand2.6M/data/InterHand2.6M/annotations/skeleton.txt', 42) #skeleton.txt is in the annotations zip ---> 11 joint_coord_out = out['joint_coord']..cpu().numpy() 12 rel_root_depth_out = out['rel_root_depth']..cpu().numpy() 13 hand_type_out = out['hand_type']..cpu().numpy() NameError: name 'out' is not defined
I could see out might be coming from model. But seems that prediction step is missing from the code.
Can you pls let me know what have you done here for Out for prediction ?
@saqib22 Yes. You need to fed the bbox coordinates and you can set the values of other parameters of the testing mode (https://github.com/facebookresearch/InterHand2.6M/blob/4e950b6465cc4eb4b26811cd0966997a7ab7b5a6/common/utils/preprocessing.py#L78)
@saqib22 @SuhelNaryal I get the following error when I run the above code.
10 skeleton = load_skeleton('/content/InterHand2.6M/data/InterHand2.6M/annotations/skeleton.txt', 42) #skeleton.txt is in the annotations zip ---> 11 joint_coord_out = out['joint_coord']..cpu().numpy() 12 rel_root_depth_out = out['rel_root_depth']..cpu().numpy() 13 hand_type_out = out['hand_type']..cpu().numpy() NameError: name 'out' is not defined
I could see out might be coming from model. But seems that prediction step is missing from the code.
Can you pls let me know what have you done here for Out for prediction ?
@ravitejageeda Yes, Out is the output from model. You need to load model and get predictions from the model. Script to modify model.py is in comments above.
@mks0601 I have modified the code for inv_trans but I am not sure what is the format if bbox that is required it is (x,y,w,h) ? and in what resolution should I detect the hand at (334 x 512) or (256, 256) ? Thanks
@mks0601 Also why does some bboxs look like this ?
could you tell me which bbox is wrong? annotation id would be helpful
@mks0601 I was able to run the code on my custom images, but to run on a live demo I want to train my own 2D hand detector but on my first attempt on using the Interhand as training data doesn't give me good results ! Can you suggest some datasets to train a robust hand detector ?
Thanks
Hi, the images of our dataset are captured from a special multi-view environment, which has very different image appearance compared with daily images. You should use in-the-wild datasets, such as coco wholebody. I'm working on a new project for in-the-wild hand pose estimation. Will upload it on arxiv soon. Please stay tuned
@mks0601 Hi thanks, definitely I will check that out ? But I have tested this repo on my own webcam images and it works fine so far as I have seen. ? Shouldn't I use this repo ?
Of course you can use this repo, but I think my new work will be definitely better.
Hi, I figured some things from the paper and ran the code on my image. I just wanted to ask how much difference does abs root depth make in results. I got the following results. Thanks for your help. You have done great work.
Do you have an inference code for our own image?
Use the demo codes.
Hi, Congratulation for such great work. I am building an application where I need a robust hand pose estimation model like yours. I tried to figure out the code to use own images myself but couldn't achieve it. Parameters like focal, principal points, abs depths are confusing me. So, Can you give me some directions on this and share any potential dates for the release of code to use your own images. Thank you.