Ubuntu 20.04 with cuda 11.8 kernal failing to install and crashing after the installation is done. #201

Closed MayurHulke closed 5 months ago

MayurHulke commented 5 months ago

Your Talk at 2024 Nvidia GTC was great @balakumar-s . I am trying to get the curobo setup on my system with isaac sim to play around and potentially do some testing but I am having some issues with performance. Any input on below will be super helpfull.

The installation failed quite few times and I manged get it installed now but its crashing sometimes for curobo even after following instructions provided by documentation. Also is Isaac sim supposed to take liek 20 min just to warm and even after that its crashing ?

Is it possible to get a exact system specs and also the kernal specs on wihch the curobo has been tested ? It will be super helpful.

Here is my system specs:

  1. cuRobo installation mode- isaac sim
  2. python version: 3.7
  3. Isaac Sim version : 2022.2.1
  4. cuda - 11.8
  5. nvidia-smi - 535.161.07
  6. PyTorch - 2.2.1+cu121
  7. GPU - NVIDIA GeForce RTX 3080
  8. RAM - 32 GB
(base) mayur@mayur:~$ nvidia-smi
Tue Mar 26 16:32:57 2024       
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3080 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P5              24W /  35W |    388MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |

| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A      1898      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      2437      G   /usr/lib/xorg/Xorg                           28MiB |
|    0   N/A  N/A      4649    C+G   ...e/ov/pkg/isaac_sim-2022.2.1/kit/kit      332MiB |
(base) mayur@mayur:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(base) mayur@mayur:~$ python
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
(base) mayur@mayur:~$ lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      39 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 2
Core(s) per socket:                 8
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              141
Model name:                         11th Gen Intel(R) Core(TM) i9-11950H @ 2.60G
Stepping:                           1
CPU MHz:                            2600.000
CPU max MHz:                        5000.0000
CPU min MHz:                        800.0000
BogoMIPS:                           5222.40
Virtualisation:                     VT-x
L1d cache:                          384 KiB
L1i cache:                          256 KiB
L2 cache:                           10 MiB
L3 cache:                           24 MiB
NUMA node0 CPU(s):                  0-15
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disable
                                    d via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __u
                                    ser pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional,
                                     RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep 
                                    mtrr pge mca cmov pat pse36 clflush dts acpi
                                     mmx fxsr sse sse2 ss ht tm pbe syscall nx p
                                    dpe1gb rdtscp lm constant_tsc art arch_perfm
                                    on pebs bts rep_good nopl xtopology nonstop_
                                    tsc cpuid aperfmperf tsc_known_freq pni pclm
                                    ulqdq dtes64 monitor ds_cpl vmx smx est tm2 
                                    ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 ss
                                    e4_2 x2apic movbe popcnt tsc_deadline_timer 
                                    aes xsave avx f16c rdrand lahf_lm abm 3dnowp
                                    refetch cpuid_fault epb cat_l2 invpcid_singl
                                    e cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced 
                                    tpr_shadow vnmi flexpriority ept vpid ept_ad
                                     fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erm
                                    s invpcid rdt_a avx512f avx512dq rdseed adx 
                                    smap avx512ifma clflushopt clwb intel_pt avx
                                    512cd sha_ni avx512bw avx512vl xsaveopt xsav
                                    ec xgetbv1 xsaves split_lock_detect dtherm i
                                    da arat pln pts hwp hwp_notify hwp_act_windo
                                    w hwp_epp hwp_pkg_req avx512vbmi umip pku os
                                    pke avx512_vbmi2 gfni vaes vpclmulqdq avx512
                                    _vnni avx512_bitalg tme avx512_vpopcntdq rdp
                                    id movdiri movdir64b fsrm avx512_vp2intersec
                                    t md_clear flush_l1d arch_capabilities

balakumar-s commented 5 months ago
  1. Isaac sim will compile shaders on first launch which can take 2-3 minutes.
  2. How are you using python 3.11 with isaac sim 2022.2.1? Isaac sim 2022.2.1 supports python 3.7. Did you follow instructions here to setup conda env: https://docs.omniverse.nvidia.com/isaacsim/latest/installation/install_python.html#advanced-running-with-anaconda
MayurHulke commented 5 months ago

Oh sorry that was from different env. I am using 3.7 within conda base enviorment. (I attached the list below)

I could not find the environment.yml in curobo repo so I created my own custom coda env. Do you have the environment.yml for this repo ?

balakumar-s commented 5 months ago

The environment.yml file is located in the isaac sim path: ~/.local/share/ov/pkg/isaac_sim-2022.2.1/

MayurHulke commented 5 months ago

This is great, It is working way smoother than before 👍🏼 . Thanks a lot @balakumar-s. Just one last question may I know what GPU were you using and RAM for like ideal perfromance for hardcore dev work (testing, experimentation etc ).

balakumar-s commented 5 months ago

Isaac Sim requirements should also be more than enough for cuRobo: https://docs.omniverse.nvidia.com/isaacsim/latest/installation/requirements.html#system-requirements