Open CCInc opened 2 years ago
This adds additional complexity due to the collation of the datasets. OpenPoints datasets are batched "densely", i.e. 16 batches of data in the shape [2048, 3] are batched into a single tensor of shape [16, 2048, 3] (implemented based on the original TP3D code here https://github.com/CCInc/3d-ml/blob/70de73291e5507b7ce75250f2f61fad12f049d8b/src/utils/batch.py#L17). Some models/backends, such as sparse convolutions, require the data to be batched differently, i.e. into a shape of [2048*16, 4], where the 4th column is a "batch index". This can be done by using the pytorch geometric collation functions (https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Batch.from_data_list), which collate in this manner by default.
TorchPoints accomplishes this by setting a configuration option in the model to define whether it uses "dense" or "sparse" data. We would likely need to do the same, and have the dataloader batch according to this configuration option. Ref: https://github.com/torch-points3d/torch-points3d/blob/66e8bf22b2d98adca804c753ac3f0013ff4ec731/torch_points3d/datasets/base_dataset.py#L160-L174
This adds additional complexity due to the collation of the datasets. OpenPoints datasets are batched "densely", i.e. 16 batches of data in the shape [2048, 3] are batched into a single tensor of shape [16, 2048, 3] (implemented based on the original TP3D code here https://github.com/CCInc/3d-ml/blob/70de73291e5507b7ce75250f2f61fad12f049d8b/src/utils/batch.py#L17). Some models/backends, such as sparse convolutions, require the data to be batched differently, i.e. into a shape of [2048*16, 4], where the 4th column is a "batch index". This can be done by using the pytorch geometric collation functions (https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Batch.from_data_list), which collate in this manner by default.
TorchPoints accomplishes this by setting a configuration option in the model to define whether it uses "dense" or "sparse" data. We would likely need to do the same, and have the dataloader batch according to this configuration option. Ref: https://github.com/torch-points3d/torch-points3d/blob/66e8bf22b2d98adca804c753ac3f0013ff4ec731/torch_points3d/datasets/base_dataset.py#L160-L174