drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
545 stars 71 forks source link

Inference results appear to have induced grid artifacts #72

Closed gvoysey closed 6 months ago

gvoysey commented 6 months ago

hi again!

running a lightly-modified dales yaml, i'm observing some grid artifacts in inference output. Below see two screenshots of the same region of a lasfile, pre- and post- inference. I used cloudcompare to show an orthographic projection, where the banding in the x-direction is pretty visible.

I'm not sure where to look for where this might be being induced -- is this expected/normal behavior? pre-inference, source lasfile from the field: 2024_23_Feb_180755

post-inference, generated lasfile from superpoint-transformer inference: 2024_23_Feb_180732

gvoysey commented 6 months ago

... having now done the responsible thing and read the rest of the open issues ( :sweat_smile: ) this may be a dupe of #9; if so, please close!

drprojects commented 6 months ago

Hi @gvoysey, I am not certain of what I am looking at in the images, to be honest. If you are seeing some grid-like patterns across all dimensions, this may be linked to the fact that our first preprocessing step is a voxelization.

But if, as your second image seems to suggest, there is a grid-like pattern along only one dimension (typically X or Y), this can usually happen when your input point positions are expressed in an absolute coordinates system with float64 precision (eg global position on earth at millimetric precision). We usually manipulate data as float32 to save memory, I/O overhead, etc. If you need to keep your absolute position, it is common practice to center your cloud tiles around 0, store your float64 absolute position offset somewhere, and cast your point coordinates to float32. This is typically what the .LAS format does, for instance.

gvoysey commented 6 months ago

Hi, sorry for the unclear report -- the perils of writing something up when you're in the middle of something!

i updated the figure captions in my first comment. In both cases, they are screenshots from cloudcompare displaying a lasfile in an orthographic projection.

The first image is a small portion of the lasfile on which inference was performed, and what's relevant is that the points in the pointcloud are fairly uniformly distributed in space.

The second image is the same region, but visualizing the lasfile produced after inference with superpoint-transformer; what's relevant is that there's "banding" or "gridding" (i'm sure there's a correct term of art here, but i'm a newcomer to the field) in the x-direction. We don't see the banding/gridding in the y-direction, though. The desired behavior is to see points distributed in the same manner as the input lasfile. (and w/r/t #9 -- the same number of points, too, eventually)

I will double-check our input lasfiles for centering and absolute position offsets. If that's not being done, it's straightforward to add to our preprocessing with pdal (or shelling out to lastools, which I am coming to understand is the ffmeg of lidar...).

If this result persists after accounting for floating point precision, then perhaps this is still #9 -- and even if not, i'm now subscribed to that issue and look forward to your upcoming work! Your project's been very eye-opening, thanks for your hard work!

drprojects commented 6 months ago

Hi again, thanks for the clarifications ! Yes, I do think that your input data is stored in float64 and you lose a little bit of precision when running our pipeline which cast it to float32. To check if this assumption is correct, I would recommend you update your data reading function (ie the equivalent or read_dales_tile() for DALES but which I imagine you adapted for reading your input LAS files) to do the following:

Visualize the output of your current and new read_dales_tile() with coordinates casted to float32. If you see a difference, then you found the source of the "banding" behavior 😉

gvoysey commented 6 months ago
modified   src/data/data.py
@@ -27,6 +27,7 @@ class Data(PyGData):

     def __init__(self, **kwargs):
         super().__init__(**kwargs)
+        self.pos_offset = None
         if src.is_debug_enabled():
             self.debug()

modified   src/datasets/custom_dataset.py
@@ -48,9 +48,17 @@ def read_dales_tile(

         if xyz:
             try:
-                data.pos = torch.stack([
-                    torch.FloatTensor(tile[key][axis].copy())
+                raw_pos = torch.stack([
+                    torch.DoubleTensor(tile[key][axis].copy())
                     for axis in ["x", "y", "z"]], dim=-1)
+                offset = raw_pos[0,:] # xyz
+                converted_pos = raw_pos - offset
+                converted_pos = converted_pos.to(torch.float32)
+                data.pos =  converted_pos
+                data.pos_offset = offset 
+                # data.pos = torch.stack([
+                #     torch.FloatTensor(tile[key][axis].copy())
+                #     for axis in ["x", "y", "z"]], dim=-1)

looks like it was floating point precision error, indeed! thanks for the spot. I don't see gridding artifacts in the results with this diff. I've also yet to put the points back into proper coordinates.

I'm picking the first point in the pointcloud to compute offsets against (arbitrarily), and putting the offsets in a new attribute on the Data class -- how do you feel about that sort of solution?

gvoysey commented 6 months ago

update: pos_offset doesn't feel like it's a good approach -- the class attribute doesn't seem to persist through inference -- but i'm not sure how else i'd preserve that information except on the class.

drprojects commented 6 months ago

Hi @gvoysey, indeed by default we only save and load the strictly necessary attributes to limit I/O overhead at training and inference time.

Simpler, hacky fix

Just save the offset elsewhere in a file of your choosing and load it when need be.

Cleaner, harder fix

To add new attributes to your Data object that should be saved at the end of preprocessing and loaded at train/inference time, you must look into the point_save_keys and point_load_keys attributes of our BaseDataset class. I assume you are saving your new pos_offset in the

By default, we set point_save_keys=None, which will save to disk all the attributes present in your NAG and Data structure at the end of preprocessing. So normally, your pos_offset attribute should be saved to disk when NAG.save() or Data.save() is called.

But, unless specified in BaseDataset.point_load_keys, your new attribute will not be loaded during training and inference. The simplest way for you to add a new attribute to point_load_keys is to add it in point_basic_load_keys or point_load_keys in the configs/datamodule/_features.yaml file:

point_basic_load_keys: ['pos', 'y', 'super_index', 'pos_offset']  # needed point attributes, other than features

There remains an issue, which we do not support and that may cause some minor issues in your case. If you look at the docstring documentation of Data.save(), you will see that it takes pos_dtype=torch.float and fp_dtype=torch.float arguments. These are used to specify the dtype format under which we want to store pos and all other floating point attributes, respectively. Since storing float64 is overkill, we store everything to float32. Even in your case, there would be no reason for storing to float64, the offset trick is a much better solution. The issue is that your newly-created pos_offset attribute will be saved under fp_dtype=torch.float too. Because of this, you will no longer see the "banding" effect you had earlier, but your point cloud tiles may have some slight offset between them due to loss of precision in the global offset. If this is a major problem for you, the simplest trick I could recommend would be to edit Data.save to explicitly save to float64 any pos_offset attribute it encounters.

        for k, val in self.items():
            if k == 'pos_offset':
                save_tensor(val, f, k, fp_dtype=torch.double)  # this should hopefully do the trick
            elif k == 'pos':
                save_tensor(val, f, k, fp_dtype=pos_dtype)
            elif k == 'y' and val.dim() > 1 and y_to_csr:
                sg = f.create_group(f"{f.name}/_csr_/{k}")
                save_dense_to_csr(val, sg, fp_dtype=fp_dtype)
            elif isinstance(val, Cluster):
                sg = f.create_group(f"{f.name}/_cluster_/sub")
                val.save(sg, fp_dtype=fp_dtype)
            elif k in ['rgb', 'mean_rgb']:
                if val.is_floating_point():
                    save_tensor((val * 255).byte(), f, k, fp_dtype=fp_dtype)
                else:
                    save_tensor(val.byte(), f, k, fp_dtype=fp_dtype)
            elif isinstance(val, torch.Tensor):
                save_tensor(val, f, k, fp_dtype=fp_dtype)
            else:
                raise NotImplementedError(f'Unsupported type={type(val)}')

Personal comment

After giving this some thought, I think this little hack would be a desired feature for everyone. So I will code it myself and it will be available in the big code update I am about to release today.

drprojects commented 6 months ago

I just released the v2 of this project, which should implement the pos_offset behavior I described above :partying_face:

Closing this issue for now, feel free to re-open if need be.

gvoysey commented 6 months ago

2024_29_Feb_094729

:pray: