Question about PointTransformerV3

SomnusKKK commented 5 months ago

I have point clouds in the form of torch.Size([8, 16, 512, 3]) how can I use V3 to Process my point clouds

DuinoDu commented 5 months ago

What is 8, 16 and 512 meaning?

Gofinge commented 5 months ago

Hi, I have the same question. Could you explain the meaning of 8, 16, 512. Assume the meaning of 3 is coordination of point clouds.

SomnusKKK commented 5 months ago

Hi, thank you for your reply,8 means batch size,16 represents the length of the sequence，512 means The number of midpoints in each frame

SomnusKKK commented 5 months ago

Now I use PTV3 as my encoder x = data["point_clouds"] # (B, T, N, 3) for test

    B, T, N, _ = x.shape

    x = x.reshape(-1, N, 3)  # (B * T, N, 3)

    pc_feature = self.encoder({'feat':None,'offset':torch.tensor([1,128,512]).cuda(),"grid_coord":x[0]})`

Gofinge commented 5 months ago

Hi, could you explain the meaning of "the length of the sequence", also an explanation of batch data here (https://github.com/Pointcept/Pointcept?tab=readme-ov-file#offset) might be helpful.

SomnusKKK commented 5 months ago

Hi, the length of the sequence also an explanation of batch data, I combine it with Batch , and I set

'offset':torch.tensor([1,128,512]).cuda()

It reported RuntimeError when hilbert

 File "/data/temp/f_15/model.py", line 983, in forward
    point.serialization(order=self.order, shuffle_orders=self.shuffle_orders)
  File "/data/temp/f_15/model.py", line 118, in serialization
    code = [
  File "/data/temp/f_15/model.py", line 119, in <listcomp>
    encode(self.grid_coord, self.batch, depth, order=order_) for order_ in order
  File "/data/anaconda3/envs/bishe3/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data//temp/f_15/serialization/default.py", line 25, in encode
    code = batch << depth * 3 | code
RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 0

Gofinge commented 5 months ago

Hi, the error was caused by an incorrect offset definition. From your description, you can try the following sudocode:

B, T, N, _ = x.shape
x = x.reshape(-1, N, 3)  # (B * T, N, 3)
data_dict = dict(
    feat = x,
    coord = x,
    grid_cood = $Grid_Coord_You_Want,
    offset = torch.arange(1, B*T + 1, device=x.device) * N
)
pred = model(data_dict)

SomnusKKK commented 5 months ago

Hi, thank you for your help , but when I tried your code It still wrong when I the 'z' encode

File "/data/temp/f_15/serialization/default.py", line 25, in encode
    code = batch << depth * 3 | code
RuntimeError: The size of tensor a (65536) must match the size of tensor b (3) at non-singleton dimension 2

Gofinge commented 5 months ago

Change

x = x.reshape(-1, N, 3)

to

x = x.reshape(-1, 3)

SomnusKKK commented 5 months ago

Hi, after I changed the code , It still wrong when I the 'hilbert' encode

File "/data/temp/f_15/serialization/default.py", line 24, in encode
    code = batch << depth * 3 | code
RuntimeError: The size of tensor a (65536) must match the size of tensor b (8192) at non-singleton dimension 0

I found that the depth is 1 and after I set depth to 16 report the same error

Gofinge commented 5 months ago

Hi, if the original shape of x is torch.Size([8, 16, 512, 3]) as your discribed, the shape of x after resize (x.reshape(-1, 3)) should be torch.Size([65536, 3]), so the size of code should also be torch.Size([65536, 3]). Still, assume the offset is defined as I said torch.arange(1, B*T + 1, device=x.device) * N, the shape of the batch should also be torch.Size([65536, 3]). Please check the shape of each related data.

SomnusKKK commented 5 months ago

Hi, I find that the shape of batch is torch.Size([65536]) and the shape of code is also torch.Size([65536]) while the shape of grid_coord is torch.Size([65536,3]) when 'z' and 'z-trans' and in "hilbert" the shape of code is also torch.Size([8192])

Gofinge commented 5 months ago

So, assume 8192 is sourced from 16 (seq_len) * 512 (num_points). As we already flatten the point cloud by merging batch_size and seq_len into on single concept of batch_size, how can the model still distinguish them?

coord: [65536, 3]
grid_coord: [65536, 3]
feat: [65536, 3]
offset: [256, 512, ..., 65536] (len = 8 * 16 = 128)

Maybe we can check the input data again for this issue.

SomnusKKK commented 5 months ago

Hi, The input data is right, and sorry that the above problem is my fault, now the shape of code is also torch.Size([65536]) in "hilbert" , and problem comes that

 File "/data/anaconda3/envs/bishe3/lib/python3.8/site-packages/spconv/pytorch/conv.py", line 755, in forward
    return self._conv_forward(self.training,
  File "/data/anaconda3/envs/bishe3/lib/python3.8/site-packages/spconv/pytorch/conv.py", line 169, in _conv_forward
    assert input.features.shape[
AssertionError: channel size mismatch

Gofinge commented 5 months ago

Change in_channel of model to 3

SomnusKKK commented 5 months ago

Hi, thank you for your patient help, problem comes with grid_coord=point.grid_coord[head_indices] >> pooling_depth,,and report "rshift_cuda" not implemented for 'Float' Is it something wrong with my torch version?

Gofinge commented 5 months ago

Check dtype of grid_coord, it should be int, not float

SomnusKKK commented 5 months ago

The dtype is float, so I need to transform it to int

occlete commented 2 months ago

@SomnusKKK Hello, can I refer to your code? I am currently exploring the possibility of applying PTv3 to point cloud sequences. The point cloud sequence shape is (batchsize,Length,N,3).

Gofinge commented 1 month ago

@SomnusKKK Hello, can I refer to your code? I am currently exploring the possibility of applying PTv3 to point cloud sequences. The point cloud sequence shape is (batchsize,Length,N,3).

Check here: https://github.com/Pointcept/Pointcept?tab=readme-ov-file#offset

batch size and number point are fused in Pointcept and also in other repo to support dynamic size. BTW: I don't know what's the meaning of length

occlete commented 1 month ago

Sometimes we need to learn about dynamic point clouds. Dynamic point cloud is a series of static point clouds combined in chronological order, for example, when we classify the action, it is a point cloud sequence, the length of the point cloud sequence is length, each frame is a point cloud, so its composition is (length,N,3).

Pointcept / Pointcept

Question about PointTransformerV3 #192