About the decoder process

drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"

MIT License

623 stars 78 forks source link

About the decoder process #86

Closed jing-zhao9 closed 7 months ago

jing-zhao9 commented 7 months ago

When I encountered some confusion while debugging your code, I used the following code during the decoding phase

up_outputs = []
if self.up_stages is not None:
    for i_stage, stage in enumerate(self.up_stages):
        i_level = self.num_down_stages - i_stage - 1
        x_skip = down_outputs[-(2 + i_stage)]
        x, _ = self._forward_up_stage(stage, nag, i_level, x, x_skip)
        up_outputs.append(x)
if self.output_stage_wise:
    out = [x] + up_outputs[::-1][1:] + [down_outputs[-1]]
    return out

down_outputs:{list:2}=[(32538,64),(10227,64)]

How to interpolate to obtain up_outputs={list: 1}=[(32538,64)] and ultimately obtain out={list: 2}=[(32538,64), (10227,64)] Could you please provide a detailed explanation of the process of downsampling and upsampling? Thank you very much！

drprojects commented 7 months ago

To be honest, I do not understand your question.

This code implements the classic behavior of a UNet model:

a series of downsampling stages are applied
after each downsampling stage, you keep track of the output, to be passed to the next stage, but also used in the skip connections
for upsampling stages, you operate on the latest upsampled cloud, concatenate it with the value from the skip connection, and run it through your decoder stage

How to interpolate...

Could you please provide a detailed explanation of the process of downsampling and upsampling?

If you are asking about how the "sampling" is done (ie you are looking for a point sampling, or a grid voxelization, or farthest point sampling operation somewhere), then the answer is: it is a hierarchical superpoint partition that we build at preprocessing time. This is the core idea of this whole project, please check our paper https://arxiv.org/abs/2306.08045.

jing-zhao9 commented 7 months ago

Thank you very much for your patient answer. I have another question about the code. If I want to train semantic segmentation with my first GPU and panoptic segmentation with my second GPU. So how should I set up my graphics card usage?

drprojects commented 7 months ago

Look into CUDA_VISIBLE_DEVICES.

Your history of issues in this repo suggest you are not too familiar with deep learning and that you have not read the paper in details. I have to warn you, the project at hand involves some fairly advanced deep learning concepts. Besides, I cannot provide tutoring here, only support for true issues related to the code, or aspects of the project that lack clarity.