Fsoft-AIC / Language-Conditioned-Affordance-Pose-Detection-in-3D-Point-Clouds

[ICRA 2024] Language-Conditioned Affordance-Pose Detection in 3D Point Clouds
https://3dapnet.github.io/
MIT License
23 stars 7 forks source link

dataset unpack #1

Closed qqianfeng closed 2 months ago

qqianfeng commented 4 months ago

Hi all,

I have downloaded the dataset, full_shape_release.pkl file. When I'm loading the dataset, it says error:

[Errno 2] No such file or directory: 'train/pose_united'

from function:

   def load_data(self):
        self.all_data = []

        shape_ids, pose_data = [], []
        pose_dir = opj(self.data_dir, 'pose_united')
        for file in os.listdir(pose_dir):
            file_dir = opj(pose_dir, file)
            if os.path.isfile(file_dir):
                shape_ids.append(os.path.splitext(file)[0])
                with open(file_dir, 'rb') as f:
                    pose_data.append(pkl.load(f))
        id_poses_dict = dict(zip(shape_ids, pose_data))

        with open(opj(self.data_dir, 'full_shape.pkl'), 'rb') as f:
            shape_data = pkl.load(f)
        id_shape_dict = {shape['shape_id']: shape for shape in shape_data}

        for id in id_poses_dict.keys():
            for affordance in id_poses_dict[id].keys():
                for pose in id_poses_dict[id][affordance]:
                    new_data_dict = {}
                    new_data_dict['shape_id'] = id
                    new_data_dict['semantic class'] = id_shape_dict[id]['semantic class']
                    new_data_dict['coordinate'] = id_shape_dict[id]['full_shape']['coordinate']
                    new_data_dict['affordance'] = affordance
                    new_data_dict['affordance_label'] = id_shape_dict[id]['full_shape']['label'][affordance]
                    new_data_dict['rotation'] = R.from_matrix(pose[1][:3, :3]).as_quat()
                    new_data_dict['translation'] = pose[1][:3, 3]
                    self.all_data.append(new_data_dict)

How should I structure the dataset from pkl file to the required folder structure? Or is there something missing?

toannguyen1904 commented 3 months ago

Hi @qqianfeng, the dataset has been re-formatted recently. For that reason, the code for dataset processing may not work properly. We will check the code and update it asap.

Best regards, Toan.

qqianfeng commented 3 months ago

Hi, @toannguyen1904 , thanks a lot for your answer. Another minor error I got is:

  File "~/workspace/Language-Conditioned-Affordance-Pose-Detection-in-3D-Point-Clouds/dataset/../models/components.py", line 22, in forward
    embeddings = math.log(10000) / (half_dim - 1)
ZeroDivisionError: float division by zero

The reason is that, given the fact that in PoseNet, the dimension is defined as 2

  self.time_net1 = SinusoidalPositionEmbeddings(dim=2)

so here half_dim is 1, leading to error that it's divided by zero. What is the correct dim here?

Thanks in advance for your answer!

toannguyen1904 commented 3 months ago

Hi @qqianfeng, sorry for the mistake. The uploaded SinusoidalPositionEmbeddings class is of old version. In fact, we added a small epsilon value of 1e-5 to half_dim - 1 to make the division plausible in case of dim=2. We have updated this point in the new commit and you can now modify your code accordingly.

Thanks, Toan.

qqianfeng commented 3 months ago

@toannguyen1904 thanks for your quick response. Yes the new SinusoidalPositionEmbeddings works. To make the training work, I only need to squeeze the outcome. But this is the case for my dataset. Not sure though it's same for you.

_t1 = self.time_net1(_t0).squeeze()
qqianfeng commented 3 months ago

Hi @toannguyen1904 , another small question

I found some hyperparam. mismatch between paper and code. In paper:

The unconditional training probability is set to puncond = 0.05.
The Adam optimizer [62] with the learning rate 10−3 and the weight decay 10−4 is used. When sampling poses, we set the guidance scale to w = 0.2

In code:

model = dict(
    type='detectiondiffusion',
    device=torch.device('cuda'),
    background_text='none',
    betas=[1e-4, 0.02],
    n_T=1000,
    drop_prob=0.1,
    weights_init='default_init',
)

optimizer = dict(
    type='adam',
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-08,
    weight_decay=1e-5, 
)
GUIDE_W = 0.5

Which parameter should I use?

toannguyen1904 commented 3 months ago

Hi @toannguyen1904 , another small question

I found some hyperparam. mismatch between paper and code. In paper:

The unconditional training probability is set to puncond = 0.05.
The Adam optimizer [62] with the learning rate 10−3 and the weight decay 10−4 is used. When sampling poses, we set the guidance scale to w = 0.2

In code:

model = dict(
    type='detectiondiffusion',
    device=torch.device('cuda'),
    background_text='none',
    betas=[1e-4, 0.02],
    n_T=1000,
    drop_prob=0.1,
    weights_init='default_init',
)

optimizer = dict(
    type='adam',
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-08,
    weight_decay=1e-5, 
)
GUIDE_W = 0.5

Which parameter should I use?

Hi @qqianfeng, the hyper-parameters mentioned in our paper should be used. However, you can also use ones in current code. I believe the result will not change too much.

@toannguyen1904 thanks for your quick response. Yes the new SinusoidalPositionEmbeddings works. To make the training work, I only need to squeeze the outcome. But this is the case for my dataset. Not sure though it's same for you.

_t1 = self.time_net1(_t0).squeeze()

I am not sure about your self-config dataset loading. However, our latest code does not require the squeeze function. Having said that, if it works for yours then that would be okay. As mentioned, we will update the code for dataset loading asap.

toannguyen1904 commented 2 months ago

Code updated.