Fsoft-AIC / Language-Conditioned-Affordance-Pose-Detection-in-3D-Point-Clouds

[ICRA 2024] Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
https://3dapnet.github.io/
MIT License
17 stars 2 forks source link

dataset unpack #1

Closed qqianfeng closed 2 days ago

qqianfeng commented 1 month ago

Hi all,

I have downloaded the dataset, full_shape_release.pkl file. When I'm loading the dataset, it says error:

[Errno 2] No such file or directory: 'train/pose_united'

from function:

   def load_data(self):
        self.all_data = []

        shape_ids, pose_data = [], []
        pose_dir = opj(self.data_dir, 'pose_united')
        for file in os.listdir(pose_dir):
            file_dir = opj(pose_dir, file)
            if os.path.isfile(file_dir):
                shape_ids.append(os.path.splitext(file)[0])
                with open(file_dir, 'rb') as f:
                    pose_data.append(pkl.load(f))
        id_poses_dict = dict(zip(shape_ids, pose_data))

        with open(opj(self.data_dir, 'full_shape.pkl'), 'rb') as f:
            shape_data = pkl.load(f)
        id_shape_dict = {shape['shape_id']: shape for shape in shape_data}

        for id in id_poses_dict.keys():
            for affordance in id_poses_dict[id].keys():
                for pose in id_poses_dict[id][affordance]:
                    new_data_dict = {}
                    new_data_dict['shape_id'] = id
                    new_data_dict['semantic class'] = id_shape_dict[id]['semantic class']
                    new_data_dict['coordinate'] = id_shape_dict[id]['full_shape']['coordinate']
                    new_data_dict['affordance'] = affordance
                    new_data_dict['affordance_label'] = id_shape_dict[id]['full_shape']['label'][affordance]
                    new_data_dict['rotation'] = R.from_matrix(pose[1][:3, :3]).as_quat()
                    new_data_dict['translation'] = pose[1][:3, 3]
                    self.all_data.append(new_data_dict)

How should I structure the dataset from pkl file to the required folder structure? Or is there something missing?

toannguyen1904 commented 1 month ago

Hi @qqianfeng, the dataset has been re-formatted recently. For that reason, the code for dataset processing may not work properly. We will check the code and update it asap.

Best regards, Toan.

qqianfeng commented 1 month ago

Hi, @toannguyen1904 , thanks a lot for your answer. Another minor error I got is:

  File "~/workspace/Language-Conditioned-Affordance-Pose-Detection-in-3D-Point-Clouds/dataset/../models/components.py", line 22, in forward
    embeddings = math.log(10000) / (half_dim - 1)
ZeroDivisionError: float division by zero

The reason is that, given the fact that in PoseNet, the dimension is defined as 2

  self.time_net1 = SinusoidalPositionEmbeddings(dim=2)

so here half_dim is 1, leading to error that it's divided by zero. What is the correct dim here?

Thanks in advance for your answer!

toannguyen1904 commented 1 month ago

Hi @qqianfeng, sorry for the mistake. The uploaded SinusoidalPositionEmbeddings class is of old version. In fact, we added a small epsilon value of 1e-5 to half_dim - 1 to make the division plausible in case of dim=2. We have updated this point in the new commit and you can now modify your code accordingly.

Thanks, Toan.

qqianfeng commented 1 month ago

@toannguyen1904 thanks for your quick response. Yes the new SinusoidalPositionEmbeddings works. To make the training work, I only need to squeeze the outcome. But this is the case for my dataset. Not sure though it's same for you.

_t1 = self.time_net1(_t0).squeeze()
qqianfeng commented 1 month ago

Hi @toannguyen1904 , another small question

I found some hyperparam. mismatch between paper and code. In paper:

The unconditional training probability is set to puncond = 0.05.
The Adam optimizer [62] with the learning rate 10−3 and the weight decay 10−4 is used. When sampling poses, we set the guidance scale to w = 0.2

In code:

model = dict(
    type='detectiondiffusion',
    device=torch.device('cuda'),
    background_text='none',
    betas=[1e-4, 0.02],
    n_T=1000,
    drop_prob=0.1,
    weights_init='default_init',
)

optimizer = dict(
    type='adam',
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-08,
    weight_decay=1e-5, 
)
GUIDE_W = 0.5

Which parameter should I use?

toannguyen1904 commented 1 month ago

Hi @toannguyen1904 , another small question

I found some hyperparam. mismatch between paper and code. In paper:

The unconditional training probability is set to puncond = 0.05.
The Adam optimizer [62] with the learning rate 10−3 and the weight decay 10−4 is used. When sampling poses, we set the guidance scale to w = 0.2

In code:

model = dict(
    type='detectiondiffusion',
    device=torch.device('cuda'),
    background_text='none',
    betas=[1e-4, 0.02],
    n_T=1000,
    drop_prob=0.1,
    weights_init='default_init',
)

optimizer = dict(
    type='adam',
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-08,
    weight_decay=1e-5, 
)
GUIDE_W = 0.5

Which parameter should I use?

Hi @qqianfeng, the hyper-parameters mentioned in our paper should be used. However, you can also use ones in current code. I believe the result will not change too much.

@toannguyen1904 thanks for your quick response. Yes the new SinusoidalPositionEmbeddings works. To make the training work, I only need to squeeze the outcome. But this is the case for my dataset. Not sure though it's same for you.

_t1 = self.time_net1(_t0).squeeze()

I am not sure about your self-config dataset loading. However, our latest code does not require the squeeze function. Having said that, if it works for yours then that would be okay. As mentioned, we will update the code for dataset loading asap.

toannguyen1904 commented 2 days ago

Code updated.