Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
Preprocessing s3dis and multiprocessing freezes after the first area #22

Closed kenomo closed 1 year ago

kenomo commented 1 year ago

First, thanks for all the effort in making your project usable for the community 💪.

Issue Preprocessing of the s3dis dataset hangs/freezes directly after the first batch (_Area1). The process does not crash, and no error messages are printed. Debugging shows that a lot of subprocesses are spawned; all rooms are processed, but the workers do not join anymore - all subprocesses are still alive, also with the fact that the _read_s3disroom function returns data. However, this line is never reached.

Environment I use the Docker container nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 and executed the install.sh.

kenomo commented 1 year ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

drprojects commented 1 year ago

Hi @kenomo, I was going to suggest looking into the multiprocessing but you found the solution in no time :wink:

Thanks for sharing your solution !

Codeei commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

kenomo commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?

Codeei commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?

Yes, I am running on a GPU rental platform, and I believe a similar issue occurred due to Docker issues. But I am not familiar with Docker's related knowledge, and I am unable to modify Docker's startup command. May I ask if you can tell me a detailed solution?