Closed kenomo closed 1 year ago
It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷♂️.
Adding the --ipc=host
flag to the Docker run
command fixed it.
Hi @kenomo, I was going to suggest looking into the multiprocessing but you found the solution in no time :wink:
Thanks for sharing your solution !
It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷♂️. Adding the
--ipc=host
flag to the Dockerrun
command fixed it.
Are you also constantly showing processing and the progress bar getting stuck at 0%?
It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷♂️. Adding the
--ipc=host
flag to the Dockerrun
command fixed it.Are you also constantly showing processing and the progress bar getting stuck at 0%?
All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?
It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷♂️. Adding the
--ipc=host
flag to the Dockerrun
command fixed it.Are you also constantly showing processing and the progress bar getting stuck at 0%?
All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?
Yes, I am running on a GPU rental platform, and I believe a similar issue occurred due to Docker issues. But I am not familiar with Docker's related knowledge, and I am unable to modify Docker's startup command. May I ask if you can tell me a detailed solution?
First, thanks for all the effort in making your project usable for the community 💪.
Issue Preprocessing of the s3dis dataset hangs/freezes directly after the first batch (_Area1). The process does not crash, and no error messages are printed. Debugging shows that a lot of subprocesses are spawned; all rooms are processed, but the workers do not join anymore - all subprocesses are still alive, also with the fact that the _read_s3disroom function returns data. However, this line is never reached.
Environment I use the Docker container
nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
and executed the install.sh.