Number of input points and changing the size of the input point cloud.

JinyuanShao commented 1 year ago

Hi, very great project!

I have some questions about this repo.

I saw you suggested the tile size of the input point cloud is 50m by 50m. But in configs/datamodule/transforms/preparations/default.yaml, you specify a fixed number of points 12500. Does this mean you downsample the cloud in a 50m by 50m grid and make the number of points 12500?

And another question, what if we want to do inference on a 100m*100m area, how do we do data normalization? You converted all points in a 50 by 50 grid to (-1,1), and if we do the same thing to the 100 by 100 grid, the spatial scale seems to become different. Any suggestions or answers for this?

Really impressive project!

CharlesGaydon commented 1 year ago

Hi! I am glad our work got onto your radar!

The file you are referring to does not exist anymore, and there are now two approaches to sampling the point in each 50 x 50m area:

The old way : in fixed_num_points.yaml we down-sample (and sometimes up-sample) every cloud to 12500 points. The cloud ends up being degraded, with an important loss of details (or, on the contrary, duplicated points).
The new, better way : in points_budget.yaml, we set lower (N=300pts) and higher (N=40000pts) bounds to the number of points in a cloud. Within these bounds, the cloud is not sampled. This happens after a GridSampling with 25cm cell size, as a first simplification step which is aimed at reducing accumulation of points in high density areas (typically in vegetation). Processing variable-sized point cloud in RandLA-Net is enabled by our re-implementation of the algorithm using the Pytorch-Geometric framework (see. this PR for details)

Indeed, we process tiles 50m by 50m, and you are right to point that to process larger areas the data normalization must be adapted. The normalization factor is defined in the NormalizePos transform. By setting parameter datamodule.subtile_width=100 when running inference, you can change this setting globally ! :)

Additionnaly, you will need to increase the aforementionned "point budget" to 4 * 40000 points to have a comparable density. I think the most convenient way to change those parameters for you would be to use a configuration file (i.e. edit the default config) and either rebuild the docker image, or pass the config to the image with keywords --config-path=XXX and config-name=XXX (hydra style).

EDIT: you also need to set datamodule.transforms.normalizations.NormalizePos.subtile_width=50 (see my comment below)

Hope it makes sense!

CharlesGaydon commented 1 year ago

If at some points you end up using our work, I would love to hear more about your use case (Discussions might be a great place to Show and Tell!) 😃

CharlesGaydon commented 1 year ago

Closing this :) Please reopen if further questions.

CharlesGaydon commented 1 year ago

I forgot something important - I just remembered it by looking at older experiments:

Indeed, we process tiles 50m by 50m, and you are right to point that to process larger areas the data normalization must be adapted. The normalization factor is defined in the NormalizePos transform. By setting parameter datamodule.subtile_width=100 when running inference, you can change this setting globally ! :)

While we must indeed set datamodule.subtile_width=100 to process larger tiles, we need to retain the original normalization value! Indeed, normalization should preserve the notion of distance that was used during training. To do so, simplify set the following: datamodule.transforms.normalizations.NormalizePos.subtile_width=50

@ JinyuanShao

IGNF / myria3d

Number of input points and changing the size of the input point cloud. #58