Arthur151 / ROMP

Monocular, One-stage, Regression of Multiple 3D People and their 3D positions & trajectories in camera & global coordinates. ROMP[ICCV21], BEV[CVPR22], TRACE[CVPR2023]
https://www.yusun.work/
Apache License 2.0
1.33k stars 229 forks source link

How to create a new dataloader for your own dataset. #88

Open jiaqigeng opened 2 years ago

jiaqigeng commented 2 years ago

Hi,

I am wondering if there is a way to use the results as training data for this model itself.

Thank you!

Arthur151 commented 2 years ago

You can use the results to create a new dataset. To add a new dataset for training:

  1. Follow the dataloader of an existing dataset, like h36m, to write a new one. After inheriting the Image_base class, you just need to properly write the get_image_info function, which prepares a img_info dictionary.
  2. Register the dataloader in the dataset_dict of mixed_dataset.py, like dataset_dict={'new_data':New_data}.
  3. Then you can use the new dataset for training via adding the dict key to the dataset in configs.

You can test the new dataset via python -m romp.lib.dataset.new_data if the new dataset is in ROMP/romp/lib/dataset/new_data.py

jiaqigeng commented 2 years ago

Hi, thanks for the response. I am not sure how I should generate the img_info dictionary.

The keys in img_info should be: imgpath', 'image', 'kp2ds', 'track_ids', 'vmask_2d', 'vmask_3d', 'kp3ds', 'params', 'camMats', 'camDists', 'img_size', 'ds'

The keys in the results are: cam, poses, betas ,j3d_all54, j3d_smpl24, j3d_spin24, j3d_op25, verts, pj2d, pj2d_org, trans, center_conf

I am not sure how to convert the keys from results to keys in the img_info.

Thank you!

Arthur151 commented 2 years ago

Here are the relationship: imgpath: path to the input image. image: the loaded image via cv2.imread, in RGB order. kp2ds: can use the pj2d_org directly. track_ids: can be None, all missing information can be set None. kp3ds: can use the j3d_all54 directly. params: np.concatenate([poses, betas],1) camMats: set None camDists: set None ds: 'new_data' or anyname you like img_size: image.shape[:2]

About the vmask_2d and vmask_3d pelase refer the defination in other dataloader.

 # vmask_2d | 0: kp2d/bbox | 1: track ids | 2: detect all people in image
 # vmask_3d | 0: kp3d | 1: smpl global orient | 2: smpl body pose | 3: smpl body shape

In this case, you can set them to

person_number = len(poses)
vmask_2d = np.array([[True,False,True] for _ in range(person_number)])
vmask_3d = np.array([[True,True,True,True] for _ in range(person_number)])