drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
560 stars 72 forks source link

Preprocessing s3dis and multiprocessing freezes after the first area #22

Closed kenomo closed 1 year ago

kenomo commented 1 year ago

First, thanks for all the effort in making your project usable for the community 💪.

Issue Preprocessing of the s3dis dataset hangs/freezes directly after the first batch (_Area1). The process does not crash, and no error messages are printed. Debugging shows that a lot of subprocesses are spawned; all rooms are processed, but the workers do not join anymore - all subprocesses are still alive, also with the fact that the _read_s3disroom function returns data. However, this line is never reached.

Environment I use the Docker container nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 and executed the install.sh.

Package                   Version
------------------------- ----------------
absl-py                   1.4.0
aiohttp                   3.8.5
aiosignal                 1.3.1
ansi2html                 1.8.0
antlr4-python3-runtime    4.9.3
anyio                     3.5.0
appdirs                   1.4.4
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
arrow                     1.2.3
asttokens                 2.0.5
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.1.0
Babel                     2.12.1
backcall                  0.2.0
beautifulsoup4            4.12.2
bleach                    4.1.0
cachetools                5.3.1
certifi                   2023.7.22
cffi                      1.15.1
charset-normalizer        3.2.0
click                     8.1.6
cloudpickle               2.2.1
cmake                     3.27.2
colorhash                 1.2.1
colorlog                  6.7.0
comm                      0.1.4
contourpy                 1.1.0
cycler                    0.11.0
dash                      2.11.1
dash-core-components      2.0.0
dash-html-components      2.0.0
dash-table                5.0.0
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.7.1
docker-pycreds            0.4.0
entrypoints               0.4
executing                 0.8.3
fastjsonschema            2.16.2
filelock                  3.12.2
Flask                     2.2.5
fonttools                 4.42.0
fqdn                      1.5.1
frnn                      0.0.0
frozenlist                1.4.0
fsspec                    2023.6.0
gdown                     4.7.1
gitdb                     4.0.10
GitPython                 3.1.32
google-auth               2.22.0
google-auth-oauthlib      1.0.0
grpcio                    1.57.0
h5py                      3.9.0
hydra-colorlog            1.2.0
hydra-core                1.3.2
hydra-submitit-launcher   1.2.0
idna                      3.4
importlib-metadata        6.8.0
importlib-resources       5.2.0
ipykernel                 6.25.0
ipython                   8.12.2
ipython-genutils          0.2.0
ipywidgets                8.1.0
isoduration               20.11.0
itsdangerous              2.1.2
jedi                      0.18.1
Jinja2                    3.1.2
joblib                    1.3.2
json5                     0.9.14
jsonpointer               2.4
jsonschema                4.19.0
jsonschema-specifications 2023.7.1
jupyter_client            7.4.9
jupyter_core              5.3.0
jupyter-dash              0.4.2
jupyter-events            0.7.0
jupyter-lsp               2.2.0
jupyter_server            2.7.0
jupyter_server_terminals  0.4.4
jupyterlab                4.0.5
jupyterlab-pygments       0.1.2
jupyterlab_server         2.24.0
jupyterlab-widgets        3.0.8
kiwisolver                1.4.4
lightning-utilities       0.9.0
lit                       16.0.6
llvmlite                  0.40.1
lxml                      4.9.2
Markdown                  3.4.4
markdown-it-py            3.0.0
MarkupSafe                2.1.1
matplotlib                3.7.2
matplotlib-inline         0.1.6
mdurl                     0.1.2
mistune                   0.8.4
mpmath                    1.3.0
multidict                 6.0.4
nb-conda-kernels          2.3.1
nbclassic                 0.5.5
nbclient                  0.5.13
nbconvert                 6.5.4
nbformat                  5.7.0
nest-asyncio              1.5.6
networkx                  3.1
notebook                  6.5.4
notebook_shim             0.2.2
numba                     0.57.1
numpy                     1.24.4
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.2.10.91
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusparse-cu11      11.7.4.91
nvidia-nccl-cu11          2.14.3
nvidia-nvtx-cu11          11.7.91
oauthlib                  3.2.2
omegaconf                 2.3.0
overrides                 7.4.0
packaging                 23.0
pandas                    2.0.3
pandocfilters             1.5.0
parso                     0.8.3
pathtools                 0.1.2
pexpect                   4.8.0
pickleshare               0.7.5
Pillow                    10.0.0
pip                       23.2.1
pkgutil_resolve_name      1.3.10
platformdirs              2.5.2
plotly                    5.9.0
plyfile                   1.0.1
prefix-sum                0.0.0
prometheus-client         0.14.1
prompt-toolkit            3.0.36
protobuf                  4.24.0
psutil                    5.9.0
ptyprocess                0.7.0
pure-eval                 0.2.2
pyasn1                    0.5.0
pyasn1-modules            0.3.0
pycocotools               2.0.7
pycparser                 2.21
pyg-lib                   0.2.0+pt20cu118
Pygments                  2.15.1
pyparsing                 3.0.9
pyrootutils               1.0.4
pyrsistent                0.18.0
PySocks                   1.7.1
python-dateutil           2.8.2
python-dotenv             1.0.0
python-json-logger        2.0.7
pytorch-lightning         2.0.6
pytz                      2023.3
PyYAML                    6.0.1
pyzmq                     25.1.1
referencing               0.30.2
requests                  2.31.0
requests-oauthlib         1.3.1
retrying                  1.3.4
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.5.2
rpds-py                   0.9.2
rsa                       4.9
scikit-learn              1.3.0
scipy                     1.10.1
seaborn                   0.12.2
Send2Trash                1.8.0
sentry-sdk                1.29.2
setproctitle              1.3.2
setuptools                68.0.0
six                       1.16.0
smmap                     5.0.0
sniffio                   1.2.0
soupsieve                 2.4
stack-data                0.2.0
submitit                  1.4.5
sympy                     1.12
tenacity                  8.2.3
tensorboard               2.14.0
tensorboard-data-server   0.7.1
terminado                 0.17.1
threadpoolctl             3.2.0
tinycss2                  1.2.1
tomli                     2.0.1
torch                     2.0.1
torch-cluster             1.6.1+pt20cu118
torch-geometric           2.3.1
torch-scatter             2.1.1+pt20cu118
torch-sparse              0.6.17+pt20cu118
torch-spline-conv         1.2.2+pt20cu118
torch-tb-profiler         0.4.1
torchmetrics              1.0.3
torchvision               0.15.2
tornado                   6.3.2
tqdm                      4.66.1
traitlets                 5.7.1
triton                    2.0.0
typing_extensions         4.7.1
tzdata                    2023.3
uri-template              1.3.0
urllib3                   1.26.16
wandb                     0.15.8
wcwidth                   0.2.5
webcolors                 1.13
webencodings              0.5.1
websocket-client          0.58.0
Werkzeug                  2.2.3
wheel                     0.38.4
widgetsnbextension        4.0.8
yarl                      1.9.2
zipp                      3.11.0
kenomo commented 1 year ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

drprojects commented 1 year ago

Hi @kenomo, I was going to suggest looking into the multiprocessing but you found the solution in no time :wink:

Thanks for sharing your solution !

Codeei commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

kenomo commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?

Codeei commented 9 months ago

It was a Docker and multiprocessing-related issue regarding IPC and/or shared memory 🤷‍♂️. Adding the --ipc=host flag to the Docker run command fixed it.

Are you also constantly showing processing and the progress bar getting stuck at 0%?

All workers showed workload but never joined after finishing. I think, that the progress bar got stuck at 0% during processing. Are you running everything inside a Docker container?

Yes, I am running on a GPU rental platform, and I believe a similar issue occurred due to Docker issues. But I am not familiar with Docker's related knowledge, and I am unable to modify Docker's startup command. May I ask if you can tell me a detailed solution?