NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.5k stars 370 forks source link

Question about slice(). #513

Open PPjmchen opened 1 year ago

PPjmchen commented 1 year ago

Hello! Thank you for your innovative work.

I was doing experiments with the slice() function, and I encountered a confusing problem. I can't understand how slice() works, espetially when the voxel has a tensor stride>1 after convolution, and I found that it seems copies a voxel feature to some raw points that do not belong to this voxel in my experiment.

Here is my example:

  1. Firstly, a set of 2D coordinates are randomly initialized:
    
    import MinkowskiEngine as ME
    import torch
    import torch.nn as nn
    import numpy as np

raw_points = torch.tensor([[0.0217, 0.2458], [0.1758, 0.8729], [0.8036, 0.4286], [0.8696, 0.3787], [0.9445, 0.7728], [0.0560, 0.4679], [0.3228, 0.0698], [0.2060, 0.6885], [0.7349, 0.6514], [0.1624, 0.3675]], dtype=torch.float64)


They looks like: 
<img width="400" alt="image" src="https://user-images.githubusercontent.com/44498651/210928403-54c2e82f-f43f-4552-bfff-a3404b559f21.png">

2. And then those points are converted to TensorField and SparseTensor, their features are all simply set as 1:

raw_points = raw_points.squeeze().unsqueeze(0)

coords, feats = ME.utils.batch_sparse_collate( [(p / 0.1, np.ones([raw_points.shape[1], 1])) for p in raw_points], dtype=torch.float32)

tf_points = ME.TensorField(coordinates=coords, features=feats)

sparse_points = tf_points.sparse()


Here I print sparse points and visualize their coordinates:

SparseTensor( coordinates=tensor([[0, 0, 2], [0, 1, 8], [0, 8, 4], [0, 8, 3], [0, 9, 7], [0, 0, 4], [0, 3, 0], [0, 2, 6], [0, 7, 6], [0, 1, 3]], dtype=torch.int32) features=tensor([[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]], dtype=torch.float64) coordinate_map_key=coordinate map key:[1, 1] coordinate_manager=CoordinateMapManagerCPU(

TensorField [1, 1,]:    CoordinateFieldMapCPU:10x3
algorithm=MinkowskiAlgorithm.DEFAULT

) spatial dimension=2)


<img width="400" alt="image" src="https://user-images.githubusercontent.com/44498651/210975734-89593dfc-a099-4eef-b0d5-e9ec8be9a95d.png">

3. And then I use a 2D strided sparse convolution layer (kernel weights are all set as 1.) to map the sparse points above:

conv_layer = ME.MinkowskiConvolution( 1, 1, kernelsize=3, stride=4, dimension=2, ) nn.init.constant(conv_layer.kernel, 1.)

x1 = conv_layer(sparse_points.float())

x1

SparseTensor( coordinates=tensor([[0, 0, 0], [0, 0, 4], [0, 4, 4], [0, 8, 4], [0, 8, 0], [0, 0, 8]], dtype=torch.int32) features=tensor([[ 0.0000], [ 0.2293], [ 0.0000], [ 0.6493], [ 0.0000], [-0.0731]], grad_fn=) coordinate_map_key=coordinate map key:[4, 4] coordinate_manager=CoordinateMapManagerCPU(

[4, 4, ]:   CoordinateMapCPU:6x3
TensorField [1, 1, ]:   CoordinateFieldMapCPU:10x3
[1, 1, ]->[4, 4, ]: cpu_kernel_map: number of unique maps:1, kernel map size:10
[1, 1, ]->[4, 4, ]: cpu_kernel_map: number of unique maps:9, kernel map size:5
algorithm=MinkowskiAlgorithm.DEFAULT

) spatial dimension=2)

I visualize `x1` with the coordinates and the corresponding faetures after convolution:

<img width="400" alt="image" src="https://user-images.githubusercontent.com/44498651/210982101-293ba3cb-4496-4a54-b094-ac9a5569a5a9.png">

**4. I use the `x1.slice(tf_points)` and get the confusing output:**

x1.slice(tf_points)

TensorField( coordinates=tensor([[0.0000, 0.2170, 2.4580], [0.0000, 1.7580, 8.7290], [0.0000, 8.0360, 4.2860], [0.0000, 8.6960, 3.7870], [0.0000, 9.4450, 7.7280], [0.0000, 0.5600, 4.6790], [0.0000, 3.2280, 0.6980], [0.0000, 2.0600, 6.8850], [0.0000, 7.3490, 6.5140], [0.0000, 1.6240, 3.6750]]) features=tensor([[ 0.0000], [ 0.2293], [ 0.0000], [ 0.0000], [ 0.2293], [ 0.6493], [ 0.0000], [ 0.0000], [-0.0731], [ 0.6493]], grad_fn=) coordinate_field_map_key=coordinate map key:[1, 1] coordinate_manager=CoordinateMapManagerCPU(

[4, 4, ]:   CoordinateMapCPU:6x3
TensorField [1, 1, ]:   CoordinateFieldMapCPU:10x3
[1, 1, ]->[4, 4, ]: cpu_kernel_map: number of unique maps:1, kernel map size:10
[1, 1, ]->[4, 4, ]: cpu_kernel_map: number of unique maps:9, kernel map size:5
algorithm=MinkowskiAlgorithm.DEFAULT

) spatial dimension=2)



It can see that the raw point with coordinate [1.7580, 8.7290] and the raw point with coordinate [9.4450, 7.7280] are both assigned the feature 0.2293, which is copied from voxel [0, 4]. Similarly, the raw point with coordinate [0.5600, 4.6790] and the raw point with coordinate [1.6240, 3.6750] are both assigned the feature 0.6493, which is copied from voxel [8, 4]. I don't understand such results, why raw point belonging to different voxels will be assigned the feature from another different voxel? 

Any help is greatly appreciated.
PenguinPhysicist commented 1 year ago

I also have issues with the .slice function. The coordinates are recovered correctly, but the feature assignments seem random:

coords = torch.zeros(10,4)
coords[:,3] = torch.arange(0,10,1)
features = torch.arange(0,20,1).reshape(10,2)

myTensorField = ME.TensorField(features.float(), coords.int())
myTensor = myTensorField.sparse() # yields Sparse Tensor in a 3D line along Z, with features from [0,1] to [18,19]

pool = ME.MinkowskiAvgPooling(kernel_size=3, stride=2, dimension=3)
myTensorPooled = pool(myTensor) # order of the coordinates is not maintained, but coordinate-feature association is

myTensorPooled.slice(myTensorField).sparse()  # yields coordinates in the original order, but the features

From looking into the SparseTensor implementation (https://github.com/NVIDIA/MinkowskiEngine/blob/master/MinkowskiEngine/MinkowskiSparseTensor.py#L577 , and line 611), when given a tensor field X this is returned:

ME.TensorField(
                self.F[X.inverse_mapping(self.coordinate_map_key).long()], # self referring to the sparse tensor
                coordinate_field_map_key=X.coordinate_field_map_key,
                coordinate_manager=X.coordinate_manager,
                quantization_mode=X.quantization_mode,
            )

So it does recover the correct coordinates through the use of the TensorFields coordinate_manager and the coordinate_field_map_key, but the inverse mapping or at least the use of it in this context seems to be the issue.

Is it be possible to keep track of the correspondence between features and coordinates in a similar way as recovering the coordinates?

ZiliangMiao commented 1 month ago

Same problem. Coordinates recovered while the features keep the original order. @chrischoy