ToruOwO / mimex

PyTorch implementation for all methods and environments in the paper "MIMEx: Intrinsic Rewards from Masked Input Modeling"
16 stars 1 forks source link

Segmentation fault (core dumped) #3

Open 0uroboro5 opened 2 hours ago

0uroboro5 commented 2 hours ago

Hi, when I reproduce the mimex-picmx experiment, I have the following problem:

Wrote config to: /home/exp/KukaPickSparse_2024-10-15_20:05:53_l5_k0.5_mr0.7_s0/config.yaml
Setting seed: 0
Setting sim options
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
Segmentation fault (core dumped)

I tried reinstalling pytorch 1.10 etc. as follows:

pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

And my cuda environment is as follows:

(mimex) root@autodl-container-34fb1182ae-77221761:~/autodl-tmp/mimex/mimex-pixmc# nvidia-smi
Tue Oct 15 20:19:56 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:2A:00.0 Off |                  Off |
| N/A   24C    P0              60W / 400W |      2MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Other than that, I try to make sure that the environment variables are set correctly:

echo 'export CUDA_HOME=/usr/local/cuda-11.3' >> ~/.bashrc 
echo 'export PATH=$CUDA_HOME/bin:$PATH' >> ~/.bashrc 
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

As a general rule of thumb, Segmentation fault generally occur when pointers are misused, however, I haven't gotten around to making changes to the source code yet, which leaves me clueless.

If you have any ideas, please let me know, I'd really appreciate it. Or would you be so kind as to provide me with a docker container that I can run?

0uroboro5 commented 2 hours ago

Sorry, I forgot that the nvcc version is confirmed to be 11.3:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

Let me know if you need any other information to determine the problem.