CCInc / 3d-ml

A versatile framework for 3D machine learning built on Pytorch Lightning and Hydra [looking for contributors!]
15 stars 3 forks source link

Add ScanNet Dataset #35

Open jaswanthbjk opened 1 year ago

jaswanthbjk commented 1 year ago

http://www.scan-net.org/

ScanNet is one of the most used 3D datasets for Semantic, Instance segmentation.

The Dataloader should support both tasks. Maybe Semantic can be inherited to produce an Instance segmentation loader.

@CCInc or @leo-stan any suggestions?

CCInc commented 1 year ago

Sounds great! I am not sure how exactly the best way to share the dataset between two tasks is, but I think for now it can be implemented on the semantic seg task and have a parameter in there for using the dataset in "instance" mode, then once we have instance segmentation models we can refactor it as we see fit.

IIRC scannet is a complicated dataset to preprocess, which has led me to avoiding it in the past.

I know torch-points3d had a scan-net dataset: https://github.com/torch-points3d/torch-points3d/blob/master/torch_points3d/datasets/segmentation/scannet.py but I'm not sure in what state it's in.

Open3d-ml also seems to have an implementation: https://github.com/isl-org/Open3D-ML/blob/master/ml3d/datasets/scannet.py https://github.com/isl-org/Open3D-ML/blob/8ddb67206e4fef55b39eea691ff00d49cef18be5/scripts/preprocess_scannet.py

I think this would be a good candidate for preprocessing and storing in Torch-geometric format (see https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html#creating-larger-datasets), just like torch-points3d does it. If you're interested in implementing it, I think using the torch-points3d dataset as a base would be a good start.

jaswanthbjk commented 1 year ago

Thanks @CCInc . I'll start implementing Scannet data loader. Assign this issue to me

leo-stan commented 1 year ago

Awesome @jaswanthbjk!

jaswanthbjk commented 1 year ago

@CCInc or @leo-stan Like torchpoints3d would you guys prefer to store the processed files in some format.

If yes which format storage is preferred like json, pt or npy ??