Error when using NVBlox: RuntimeError: CUDA error: an illegal memory access was encountered

NVlabs / curobo

CUDA Accelerated Robot Library

https://curobo.org

Other

722 stars 107 forks source link

Error when using NVBlox: RuntimeError: CUDA error: an illegal memory access was encountered #39

Closed cedricgoubard closed 7 months ago

cedricgoubard commented 9 months ago

Hello,

I am getting an error when trying to use NVBlox with cuRobo. This happens both when I use the motion_gen_reacher_nvblox.py script with IsaacSim, and when I simply instantiate and warmup the MotionGen class in Python. Everything is working properly if I use motion_gen_reacher.py, but not motion_gen_reacher_nvblox.py.

The error is RuntimeError: CUDA error: an illegal memory access was encountered.

My setup is:

cuRobo installation mode: docker python and docker isaac
python version: 3.8.10
Isaac Sim version (if using):2022.2.1
GPU: NVIDIA GeForce RTX 3060
CUDA 11.7

Stacktrace (isaacsim)

``` /isaac-sim/python.sh curobo/examples/isaac_sim/motion_gen_reacher_nvblox.py --robot /ros_ws/src/curobo_ros/cfg/amiga.yaml --visualize_spheres [14.954s] app ready [15.212s] RTX ready [15.356s] Simulation App Startup Complete torchtyping could not be imported, falling back to basic types 2023-11-14 09:34:46 [19,734ms] [Warning] [omni.isaac.urdf] The path amiga_arm_base_link-base_link_inertia is not a valid usd path, modifying to amiga_arm_base_link_base_link_inertia 2023-11-14 09:34:46 [19,734ms] [Warning] [omni.isaac.urdf] The path amiga_arm_base_link-base_fixed_joint is not a valid usd path, modifying to amiga_arm_base_link_base_fixed_joint 2023-11-14 09:34:46 [19,734ms] [Warning] [omni.isaac.urdf] The path amiga_arm_wrist_3-flange is not a valid usd path, modifying to amiga_arm_wrist_3_flange 2023-11-14 09:34:46 [19,734ms] [Warning] [omni.isaac.urdf] The path amiga_arm_flange-tool0 is not a valid usd path, modifying to amiga_arm_flange_tool0 2023-11-14 09:34:46 [19,734ms] [Warning] [omni.isaac.urdf] The path amig_arm_wrist_3_link-amiga_grip_camera_mount_link is not a valid usd path, modifying to amig_arm_wrist_3_link_amiga_grip_camera_mount_link /World/amiga/Looks/material_Material_001_1 WARNING: Logging before InitGoogleLogging() is written to STDERR I1114 09:35:34.738322 29 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_9TsdfVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 09:35:34.738354 29 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_10ColorVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 09:35:34.738360 29 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_14OccupancyVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 09:35:34.738365 29 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_9EsdfVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 09:35:34.738370 29 layer_cake_impl.h:32] Adding Layer with type: N6nvblox10BlockLayerINS_9MeshBlockEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. E1114 09:35:34.744175 29 sqlite_database.cpp:192] Preparing query failed: file is not a database E1114 09:35:34.744215 29 mapper.cpp:238] Failed to load map from file: /curobo/curobo/src/curobo/content/assets/scene/nvblox/srl_ur10_bins.nvblx warming up... /curobo/curobo/src/curobo/opt/particle/parallel_mppi.py:224: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback` (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:331.) cov_update = jit_diag_a_cov_update(w, actions, self.mean_action) /curobo/curobo/src/curobo/opt/particle/parallel_mppi.py:297: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback` (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:331.) new_cov = jit_blend_cov(self.cov_action, cov_update, self.step_size_cov, self.kappa) /curobo/curobo/src/curobo/opt/particle/parallel_mppi.py:301: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback` (Triggered internally at ../torch/csrc/jit/codegen/cuda/manager.cpp:331.) new_mean = jit_blend_mean(self.mean_action, new_mean, self.step_size_mean) Traceback (most recent call last): File "curobo/examples/isaac_sim/motion_gen_reacher_nvblox.py", line 300, in main() File "curobo/examples/isaac_sim/motion_gen_reacher_nvblox.py", line 163, in main motion_gen.warmup(enable_graph=True, warmup_js_trajopt=False) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1858, in warmup link_poses=link_poses, File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1215, in plan_single link_poses=link_poses, File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1037, in _plan_attempts link_poses, File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1513, in _plan_from_solve_state newton_iters=trajopt_newton_iters, File "/isaac-sim/kit/python/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 909, in _solve_trajopt_from_solve_state newton_iters=newton_iters, File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 496, in solve_any newton_iters=newton_iters, File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 618, in solve_single newton_iters=newton_iters, File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 577, in solve_from_solve_state result = self.solver.solve(goal_buffer, seed_traj) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 138, in solve act_seq = self.optimize(seed, shift_steps=0) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 70, in optimize act_seq = opt.optimize(act_seq, shift_steps) File "/curobo/curobo/src/curobo/opt/opt_base.py", line 93, in optimize out = self._optimize(opt_tensor, shift_steps, n_iters) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 216, in _optimize self._initialize_cuda_graph(init_act, shift_steps=shift_steps) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 235, in _initialize_cuda_graph self._cu_act_seq = self._run_opt_iters(self._cu_act_in, shift_steps=shift_steps) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 260, in _run_opt_iters trajectory = self.generate_rollouts() File "/isaac-sim/extscache/omni.pip.torch-1_13_1-0.1.4+104.2.lx64/torch-1-13-1/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/curobo/curobo/src/curobo/opt/particle/parallel_mppi.py", line 578, in generate_rollouts return super().generate_rollouts(init_act) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 178, in generate_rollouts act_seq = self.sample_actions(init_act) File "/isaac-sim/extscache/omni.pip.torch-1_13_1-0.1.4+104.2.lx64/torch-1-13-1/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/curobo/curobo/src/curobo/opt/particle/parallel_mppi.py", line 306, in sample_actions delta = torch.index_select(self._sample_set, 0, self._sample_iter).squeeze(0) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception ignored in: Traceback (most recent call last): File "/isaac-sim/kit/exts/omni.kit.stage_templates/omni/kit/stage_templates/templates/default_stage.py", line 13, in __del__ File "/isaac-sim/kit/exts/omni.kit.stage_templates/omni/kit/stage_templates/new_stage.py", line 280, in unregister_template TypeError: 'NoneType' object is not callable Exception ignored in: Traceback (most recent call last): File "/isaac-sim/kit/exts/omni.kit.stage_templates/omni/kit/stage_templates/templates/sunlight.py", line 13, in __del__ File "/isaac-sim/kit/exts/omni.kit.stage_templates/omni/kit/stage_templates/new_stage.py", line 280, in unregister_template TypeError: 'NoneType' object is not callable /isaac-sim/python.sh: line 41: 29 Segmentation fault (core dumped) $python_exe "$@" $args There was an error running python ```

Here is the code I tried when using Python only:

motion_gen_config = MotionGenConfig.load_from_robot_config(
            robot_cfg,
            world_cfg,
            tensor_args,
            trajopt_tsteps=32,
            collision_checker_type=CollisionCheckerType.BLOX,
            use_cuda_graph=True,
            num_trajopt_seeds=12,
            num_graph_seeds=12,
            interpolation_dt=0.03,
            collision_activation_distance=0.025,
            acceleration_scale=1.0,
            self_collision_check=True,
            maximum_trajectory_dt=0.25,
            finetune_dt_scale=1.05,
            fixed_iters_trajopt=True,
            finetune_trajopt_iters=300,
            minimize_jerk=True,
        )
motion_gen = MotionGen(motion_gen_config)
motion_gen.warmup(warmup_js_trajopt=False)

I added some debug prints in curobo/wrap/reacher/motion_gen.py; here is what plan_single is getting as inputs:

start_state = JointState(
    position=tensor([[0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000]], device='cuda:0'),
    velocity=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0'),
    acceleration=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0'),
    joint_names=['amiga_arm_shoulder_pan_joint', 'amiga_arm_shoulder_lift_joint', 'amiga_arm_elbow_joint', 'amiga_arm_wrist_1_joint', 'amiga_arm_wrist_2_joint', 'amiga_arm_wrist_3_joint'],
    jerk=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0')
     tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32)
)

retract_pose = Pose(
    position=tensor([[ 0.3911, -1.1841,  0.3609]], device='cuda:0'),
    quaternion=tensor([[ 0.2708,  0.6531,  0.6533, -0.2707]], device='cuda:0'),
    rotation=None,
    batch=1, n_goalset=1, name='ee_link', normalize_rotation=True
)

link_poses = {'amiga_gripper_palm': Pose(
    position=tensor([[ 0.3911, -1.1841,  0.3609]], device='cuda:0'),
    quaternion=tensor([[ 0.2708,  0.6531,  0.6533, -0.2707]], device='cuda:0'),
    rotation=None, batch=1, n_goalset=1, name='ee_link', normalize_rotation=False
)}

Here are my installation steps, in case it helps

```dockerfile ################################################################################################### ######################################### CMake version ########################################### ################################################################################################### RUN wget https://cmake.org/files/v3.19/cmake-3.19.5.tar.gz && tar -xvzf cmake-3.19.5.tar.gz RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential checkinstall zlib1g-dev libssl-dev RUN cd cmake-3.19.5 && ./bootstrap RUN cd cmake-3.19.5 && make -j `nproc` && make install RUN rm -rf cmake-3.19.5.tar.gz #################################################################################################### ############################################## nvblox ############################################## #################################################################################################### RUN git clone https://github.com/valtsblukis/nvblox.git RUN apt-get install -y libgoogle-glog-dev libgtest-dev libgflags-dev python3-dev libsqlite3-dev RUN cd /usr/src/googletest && sudo cmake . && sudo cmake --build . --target install RUN cd nvblox && cd nvblox && mkdir build && cd build && \ cmake .. -DPRE_CXX11_ABI_LINKABLE=ON -DCMAKE_CUDA_FLAGS="-gencode=arch=compute_86,code=sm_86" -DCMAKE_CUDA_ARCHITECTURES="86" -DSTDGPU_CUDA_ARCHITECTURE_FLAGS_USER="86" -DBUILD_FOR_ALL_ARCHS=ON &&\ make -j `nproc` &&\ make install RUN git clone https://github.com/nvlabs/nvblox_torch.git RUN cd nvblox_torch && sh install.sh #################################################################################################### ############################################## curobo ############################################## #################################################################################################### RUN git clone https://github.com/NVlabs/curobo.git RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections ENV PATH="${PATH}:/opt/hpcx/ompi/bin" ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/hpcx/ompi/lib" ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" RUN cd curobo && pip install -e . --no-build-isolation ```

Do you have any ideas what may be causing it? Thank you for your help!

balakumar-s commented 9 months ago

It looks like the map failed to load from your stack trace:

E1114 09:35:34.744215    29 mapper.cpp:238] Failed to load map from file: /curobo/curobo/src/curobo/content/assets/scene/nvblox/srl_ur10_bins.nvblx

Can you check if this file exists in your clone of curobo: https://github.com/NVlabs/curobo/blob/main/src/curobo/content/assets/scene/nvblox/srl_ur10_bins.nvblx

Make sure this file is around 80mb and not just a lfs pointer. If it's a lfs pointer, you need to do the following:

sudo apt install git-lfs 
cd curobo && git lfs pull

cedricgoubard commented 9 months ago

Hi, You were right, that solved the problem in IsaacSim... thank you! However, I am not loading any map in Python, but I am still getting the error. Here are some more details:


world_cfg = WorldConfig.from_dict(
            {
                "blox": {
                    "world": {
                        "pose": [0, 0, 0, 1, 0, 0, 0],
                        "integrator_type": "occupancy",
                        "voxel_size": 0.03,
                    }
                }
            }
        )

motion_gen_config = MotionGenConfig.load_from_robot_config(
            robot_cfg,
            world_cfg,
            tensor_args,
            trajopt_tsteps=32,
            collision_checker_type=CollisionCheckerType.BLOX,
            use_cuda_graph=True,
            num_trajopt_seeds=12,
            num_graph_seeds=12,
            interpolation_dt=0.03,
            collision_activation_distance=0.025,
            acceleration_scale=1.0,
            self_collision_check=True,
            maximum_trajectory_dt=0.25,
            finetune_dt_scale=1.05,
            fixed_iters_trajopt=True,
            finetune_trajopt_iters=300,
            minimize_jerk=True,
        )
motion_gen = MotionGen(motion_gen_config)
motion_gen.warmup(warmup_js_trajopt=False)

Stacktrace (no Isaac this time)

``` Traceback (most recent call last): File "/ros_ws/devel/lib/curobo_ros/interface.py", line 15, in exec(compile(fh.read(), python_script, 'exec'), context) File "/ros_ws/src/curobo_ros/nodes/interface.py", line 18, in interface = MotionGenServer(robot_cfg) File "/ros_ws/src/curobo_ros/src/curobo_ros/motiongen.py", line 49, in __init__ motion_gen.warmup(warmup_js_trajopt=False) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1859, in warmup self.plan_single( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1210, in plan_single result = self._plan_attempts( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1032, in _plan_attempts result = self._plan_from_solve_state( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1508, in _plan_from_solve_state traj_result = self._solve_trajopt_from_solve_state( File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 901, in _solve_trajopt_from_solve_state traj_result = trajopt_instance.solve_any( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 490, in solve_any return self.solve_single( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 611, in solve_single return self.solve_from_solve_state( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 577, in solve_from_solve_state result = self.solver.solve(goal_buffer, seed_traj) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 138, in solve act_seq = self.optimize(seed, shift_steps=0) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 70, in optimize act_seq = opt.optimize(act_seq, shift_steps) File "/curobo/curobo/src/curobo/opt/opt_base.py", line 93, in optimize out = self._optimize(opt_tensor, shift_steps, n_iters) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 211, in _optimize curr_action_seq = self._run_opt_iters( File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 260, in _run_opt_iters trajectory = self.generate_rollouts() File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/curobo/curobo/src/curobo/opt/particle/parallel_mppi.py", line 578, in generate_rollouts return super().generate_rollouts(init_act) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 179, in generate_rollouts trajectories = self.rollout_fn(act_seq) File "/curobo/curobo/src/curobo/rollout/rollout_base.py", line 457, in __call__ return self.rollout_fn(act) File "/curobo/curobo/src/curobo/rollout/arm_base.py", line 589, in rollout_fn cost_seq = self.cost_fn(state, act_seq) File "/curobo/curobo/src/curobo/rollout/arm_reacher.py", line 218, in cost_fn cost_list = super(ArmReacher, self).cost_fn(state, action_batch, return_list=True) File "/curobo/curobo/src/curobo/rollout/arm_base.py", line 363, in cost_fn coll_cost = self.primitive_collision_cost.forward( File "/curobo/curobo/src/curobo/rollout/cost/primitive_collision_cost.py", line 106, in sweep_kernel_fn dist = self.sweep_check_fn( File "/curobo/curobo/src/curobo/geom/sdf/world_blox.py", line 274, in get_swept_sphere_distance d = self._get_blox_swept_sdf( File "/curobo/curobo/src/curobo/geom/sdf/world_blox.py", line 151, in _get_blox_swept_sdf d = self._blox_mapper.query_sphere_trajectory_sdf_cost( File "/curobo/nvblox_torch/nvblox_torch/mapper.py", line 234, in query_sphere_trajectory_sdf_cost distance = SdfSphereTrajectoryCostMultiBlox.apply( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/curobo/nvblox_torch/nvblox_torch/sdf_query.py", line 120, in forward r = c_mapper.query_sphere_trajectory_sdf_cost( RuntimeError: CUDA error: an illegal memory access was encountered ```

balakumar-s commented 9 months ago

Can you try calling motion_gen.world_collision.update_blox_hashes() before calling motion_gen.warmup()?

If that doesn't work, can you add a table obstacle similar to https://github.com/NVlabs/curobo/blob/e0e804e9062da33c13aa262a1da253d364b8768a/examples/isaac_sim/realsense_reacher.py#L186 ? This might help narrow down the issue you are having.

cedricgoubard commented 9 months ago

Thank you for the feedback; I did both, but unfortunately it looks like it did not change anything.

Code

```python rospy.loginfo("Initialising MotionGenServer") world_cfg = WorldConfig.from_dict( { "blox": { "world": { "pose": [0, 0, 0, 1, 0, 0, 0], "integrator_type": "occupancy", "voxel_size": 0.03, } } } ) tensor_args = TensorDeviceType() world_cfg_table = WorldConfig.from_dict( load_yaml("/curobo/curobo/src/curobo/content/configs/world/collision_wall.yml") ) world_cfg_table.cuboid[0].pose[2] -= 0.01 world_cfg.add_obstacle(world_cfg_table.cuboid[0]) world_cfg.add_obstacle(world_cfg_table.cuboid[1]) rospy.loginfo("WorldConfig loaded with obstacle; creating MotionGenConfig") rospy.loginfo("WorldConfigTable: {}".format(world_cfg_table)) motion_gen_config = MotionGenConfig.load_from_robot_config( robot_cfg, world_cfg, tensor_args, trajopt_tsteps=32, collision_checker_type=CollisionCheckerType.BLOX, use_cuda_graph=True, num_trajopt_seeds=12, num_graph_seeds=12, interpolation_dt=0.03, collision_activation_distance=0.025, acceleration_scale=1.0, self_collision_check=True, maximum_trajectory_dt=0.25, finetune_dt_scale=1.05, fixed_iters_trajopt=True, finetune_trajopt_iters=300, minimize_jerk=True, ) rospy.loginfo("MotionGenConfig loaded; creating MotionGen") motion_gen = MotionGen(motion_gen_config) rospy.loginfo("updating blox hashes") motion_gen.world_collision.update_blox_hashes() rospy.loginfo("MotionGen created; warming up") motion_gen.warmup(warmup_js_trajopt=False) rospy.loginfo("MotionGen warmed up") self.cam_interface = CameraToNVBlox(world_cfg, buffers_size=50, use_rgb=False) rospy.loginfo("MotionGenServer initialised") ```

Stacktrace

``` [INFO] [1699997157.736202, 0.000000]: Initialising MotionGenServer [INFO] [1699997157.739954, 0.000000]: WorldConfig loaded with obstacle; creating MotionGenConfig [INFO] [1699997157.741007, 0.000000]: WorldConfigTable: WorldConfig(sphere=[], cuboid=[Cuboid(name='table', pose=[0.0, 0.0, -0.11, 1, 0, 0, 0.0], scale=None, color=[0.6, 0.6, 0.8, 1.0], texture_id=None, texture=None, material=Material(metallic=0.0, roughness=0.4), tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32), dims=[2.2, 2.2, 0.2]), Cuboid(name='cube4', pose=[-0.5, 0.0, 0.3, 1.0, 0.0, 0.0, 0.0], scale=None, color=[0.6, 0.6, 0.8, 1.0], texture_id=None, texture=None, material=Material(metallic=0.0, roughness=0.4), tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32), dims=[0.05, 2.0, 2.0])], capsule=[], cylinder=[], mesh=[], blox=[], objects=[Cuboid(name='table', pose=[0.0, 0.0, -0.11, 1, 0, 0, 0.0], scale=None, color=[0.6, 0.6, 0.8, 1.0], texture_id=None, texture=None, material=Material(metallic=0.0, roughness=0.4), tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32), dims=[2.2, 2.2, 0.2]), Cuboid(name='cube4', pose=[-0.5, 0.0, 0.3, 1.0, 0.0, 0.0, 0.0], scale=None, color=[0.6, 0.6, 0.8, 1.0], texture_id=None, texture=None, material=Material(metallic=0.0, roughness=0.4), tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32), dims=[0.05, 2.0, 2.0])]) WARNING: Logging before InitGoogleLogging() is written to STDERR I1114 21:25:59.966771 287120 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_9TsdfVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 21:25:59.966796 287120 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_10ColorVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 21:25:59.966800 287120 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_14OccupancyVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 21:25:59.966804 287120 layer_cake_impl.h:32] Adding Layer with type: N6nvblox15VoxelBlockLayerINS_9EsdfVoxelEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. I1114 21:25:59.966806 287120 layer_cake_impl.h:32] Adding Layer with type: N6nvblox10BlockLayerINS_9MeshBlockEEE, voxel_size: 0.03, and memory_type: kDevice to LayerCake. [INFO] [1699997161.576852, 0.000000]: MotionGenConfig loaded; creating MotionGen [INFO] [1699997161.578162, 0.000000]: updating blox hashes [INFO] [1699997161.578972, 0.000000]: MotionGen created; warming up ==== DEBUG ==== JointState(position=tensor([[0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000]], device='cuda:0'), velocity=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0'), acceleration=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0'), joint_names=['amiga_arm_shoulder_pan_joint', 'amiga_arm_shoulder_lift_joint', 'amiga_arm_elbow_joint', 'amiga_arm_wrist_1_joint', 'amiga_arm_wrist_2_joint', 'amiga_arm_wrist_3_joint'], jerk=tensor([[0., 0., 0., 0., 0., 0.]], device='cuda:0'), tensor_args=TensorDeviceType(device=device(type='cuda', index=0), dtype=torch.float32)) Pose(position=tensor([[ 0.3911, -1.1841, 0.3609]], device='cuda:0'), quaternion=tensor([[ 0.2708, 0.6531, 0.6533, -0.2707]], device='cuda:0'), rotation=None, batch=1, n_goalset=1, name='ee_link', normalize_rotation=True) {'amiga_gripper_palm': Pose(position=tensor([[ 0.3911, -1.1841, 0.3609]], device='cuda:0'), quaternion=tensor([[ 0.2708, 0.6531, 0.6533, -0.2707]], device='cuda:0'), rotation=None, batch=1, n_goalset=1, name='ee_link', normalize_rotation=False)} ==== DEBUG ==== Traceback (most recent call last): File "/ros_ws/devel/lib/curobo_ros/interface.py", line 15, in exec(compile(fh.read(), python_script, 'exec'), context) File "/ros_ws/src/curobo_ros/nodes/interface.py", line 18, in interface = MotionGenServer(robot_cfg) File "/ros_ws/src/curobo_ros/src/curobo_ros/motiongen.py", line 62, in __init__ motion_gen.warmup(warmup_js_trajopt=False) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1859, in warmup self.plan_single( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1210, in plan_single result = self._plan_attempts( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1032, in _plan_attempts result = self._plan_from_solve_state( File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1508, in _plan_from_solve_state traj_result = self._solve_trajopt_from_solve_state( File "/usr/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/curobo/curobo/src/curobo/wrap/reacher/motion_gen.py", line 901, in _solve_trajopt_from_solve_state traj_result = trajopt_instance.solve_any( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 490, in solve_any return self.solve_single( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 611, in solve_single return self.solve_from_solve_state( File "/curobo/curobo/src/curobo/wrap/reacher/trajopt.py", line 577, in solve_from_solve_state result = self.solver.solve(goal_buffer, seed_traj) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 138, in solve act_seq = self.optimize(seed, shift_steps=0) File "/curobo/curobo/src/curobo/wrap/wrap_base.py", line 70, in optimize act_seq = opt.optimize(act_seq, shift_steps) File "/curobo/curobo/src/curobo/opt/opt_base.py", line 93, in optimize out = self._optimize(opt_tensor, shift_steps, n_iters) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 211, in _optimize curr_action_seq = self._run_opt_iters( File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 260, in _run_opt_iters trajectory = self.generate_rollouts() File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/curobo/curobo/src/curobo/opt/particle/parallel_mppi.py", line 578, in generate_rollouts return super().generate_rollouts(init_act) File "/curobo/curobo/src/curobo/opt/particle/particle_opt_base.py", line 179, in generate_rollouts trajectories = self.rollout_fn(act_seq) File "/curobo/curobo/src/curobo/rollout/rollout_base.py", line 457, in __call__ return self.rollout_fn(act) File "/curobo/curobo/src/curobo/rollout/arm_base.py", line 589, in rollout_fn cost_seq = self.cost_fn(state, act_seq) File "/curobo/curobo/src/curobo/rollout/arm_reacher.py", line 218, in cost_fn cost_list = super(ArmReacher, self).cost_fn(state, action_batch, return_list=True) File "/curobo/curobo/src/curobo/rollout/arm_base.py", line 363, in cost_fn coll_cost = self.primitive_collision_cost.forward( File "/curobo/curobo/src/curobo/rollout/cost/primitive_collision_cost.py", line 106, in sweep_kernel_fn dist = self.sweep_check_fn( File "/curobo/curobo/src/curobo/geom/sdf/world_blox.py", line 274, in get_swept_sphere_distance d = self._get_blox_swept_sdf( File "/curobo/curobo/src/curobo/geom/sdf/world_blox.py", line 151, in _get_blox_swept_sdf d = self._blox_mapper.query_sphere_trajectory_sdf_cost( File "/curobo/nvblox_torch/nvblox_torch/mapper.py", line 234, in query_sphere_trajectory_sdf_cost distance = SdfSphereTrajectoryCostMultiBlox.apply( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/curobo/nvblox_torch/nvblox_torch/sdf_query.py", line 120, in forward r = c_mapper.query_sphere_trajectory_sdf_cost( RuntimeError: CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. double free or corruption (!prev) ```

The values between the ==DEBUG== flags in the stacktrace are prints I added to see the input of plan_single.

balakumar-s commented 9 months ago

Does pytest tests/nvblox_test.py pass?

cedricgoubard commented 9 months ago

Hi, The tests do not pass. I have been looking into it, but I am worried I might have made things worse: I am getting some ImportError now, with nvblox_torch not being properly installed.

More specifically:

Install steps

```dockerfile RUN apt-get update -y && apt-get install -y --no-install-recommends git vim python-is-python3 RUN python -m pip install --upgrade pip && \ python -m pip install --upgrade setuptools numpy transforms3d torchtyping typing-extensions WORKDIR /curobo ENV TORCH_CUDA_ARCH_LIST "8.6+PTX" ################################################################################################### ######################################### CMake version ########################################### ################################################################################################### RUN wget https://cmake.org/files/v3.19/cmake-3.19.5.tar.gz && tar -xvzf cmake-3.19.5.tar.gz RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential checkinstall zlib1g-dev libssl-dev RUN cd cmake-3.19.5 && ./bootstrap RUN cd cmake-3.19.5 && make -j `nproc` && make install RUN rm -rf cmake-3.19.5.tar.gz #################################################################################################### ############################################## curobo ############################################## #################################################################################################### RUN git clone https://github.com/NVlabs/curobo.git RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections ENV PATH="${PATH}:/opt/hpcx/ompi/bin" ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/hpcx/ompi/lib" ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" RUN cd curobo && python -m pip install -e . --no-build-isolation RUN apt-get install -y git-lfs &&\ cd curobo &&\ git lfs pull #################################################################################################### ############################################## nvblox ############################################## #################################################################################################### RUN git clone https://github.com/valtsblukis/nvblox.git RUN apt-get install -y libgoogle-glog-dev libgtest-dev libgflags-dev python3-dev libsqlite3-dev RUN cd /usr/src/googletest && sudo cmake . && sudo cmake --build . --target install RUN cd nvblox && cd nvblox && mkdir build && cd build && \ cmake .. -DPRE_CXX11_ABI_LINKABLE=ON -DCMAKE_CUDA_FLAGS="-gencode=arch=compute_86,code=sm_86" -DCMAKE_CUDA_ARCHITECTURES="86" -DSTDGPU_CUDA_ARCHITECTURE_FLAGS_USER="86" -DBUILD_FOR_ALL_ARCHS=ON &&\ make -j `nproc` &&\ make install RUN git clone https://github.com/nvlabs/nvblox_torch.git RUN cd nvblox_torch && sh install.sh ```

Relevant tests and outputs:

#### Environment ####
python -V
Python 3.8.10

python -m pip -V
pip 23.3.1 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

pwd
/

#### CuRobo ####
python -m pip freeze | grep curobo
-e git+https://github.com/NVlabs/curobo.git@e0e804e9062da33c13aa262a1da253d364b8768a#egg=nvidia_curobo

python -c 'import curobo'
# No error

#### NVBlox Torch ####
python -m pip freeze | grep nvblox
# nothing

python -c 'import nvblox_torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'nvblox_torch

#### But it works if I am in the folder ####
cd curobo/nvblox_torch/

python -c 'import nvblox_torch'
# No error

#### Also, if I remove the '-e' from nvblox_torch/install.sh, it works
vim install.sh
sh ./install.sh
cd /
python -c 'import nvblox_torch'
# No error

Any idea what is causing this? I might have missed something in the nvblox_torch install steps, which would explain the CUDA errors I was getting.

balakumar-s commented 9 months ago

I am looking at refining the nvblox installation. Is your dockerfile starting from a pytorch container? If you can give me your dockerfile, I can use that to test the nvblox installation and possibly update it to be more robust.

cedricgoubard commented 9 months ago

Hi @balakumar-s Sorry about the delay, I was away for the last week.

Here's the Dockerfile

```dockerfile FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 # nvidia-container-runtime ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES all ENV LANG C.UTF-8 ENV LC_ALL C.UTF-8 ENV ROS_DISTRO noetic ENV MSCL_LIB_PATH /usr/share/c++-mscl SHELL ["/bin/bash","-c"] ################################################################################################### ########################################## ROS Noetic ############################################# ################################################################################################### # setup timezone RUN echo 'Etc/UTC' > /etc/timezone && \ ln -s /usr/share/zoneinfo/Etc/UTC /etc/localtime && \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends tzdata && \ rm -rf /var/lib/apt/lists/* # install packages RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends \ dirmngr \ gnupg2 \ && rm -rf /var/lib/apt/lists/* # setup the sources list RUN sh -c 'echo "deb http://packages.ros.org/ros/ubuntu focal main" > /etc/apt/sources.list.d/ros-latest.list' # setup keys RUN apt-key adv --keyserver 'hkp://keyserver.ubuntu.com:80' --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 # built-in packages RUN apt-get update \ && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ros-noetic-ros-base \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN echo "source /opt/ros/noetic/setup.bash" >> /root/.bashrc RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ python3 \ python3-pip \ python3-rosdep \ python3-rosinstall \ python3-rosinstall-generator \ python3-wstool \ python-is-python3 \ python3-catkin-tools \ # Basic utilities iputils-ping \ wget \ # ROS packages ros-$ROS_DISTRO-cv-bridge \ ros-$ROS_DISTRO-tf2-tools \ ros-$ROS_DISTRO-tf \ ros-${ROS_DISTRO}-xacro \ ros-${ROS_DISTRO}-joint-state-publisher \ ros-${ROS_DISTRO}-robot-state-publisher \ ros-${ROS_DISTRO}-rviz \ build-essential \ --no-install-recommends \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN DEBIAN_FRONTEND=noninteractive rosdep init && DEBIAN_FRONTEND=noninteractive rosdep update # Create local catkin ws ENV ROS_WS=/ros_ws RUN mkdir -p $ROS_WS/src # Set the working directory to /catkin_ws WORKDIR $ROS_WS RUN source /opt/ros/$ROS_DISTRO/setup.bash \ && catkin init \ && catkin config --cmake-args -DCMAKE_BUILD_TYPE=Release \ && catkin build RUN echo 'source /opt/ros/${ROS_DISTRO}/setup.bash' >> /root/.bashrc ################################################################################################### ############################################ PyTorch ############################################## ################################################################################################### RUN pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117 ################################################################################################### ##################################### CuRobo prerequisites ######################################## ################################################################################################### RUN apt-get update -y && apt-get install -y --no-install-recommends git vim python-is-python3 RUN python -m pip install --upgrade pip && \ python -m pip install --upgrade setuptools numpy transforms3d torchtyping typing-extensions WORKDIR /curobo ENV TORCH_CUDA_ARCH_LIST "8.6+PTX" ################################################################################################### ######################################### CMake version ########################################### ################################################################################################### RUN wget https://cmake.org/files/v3.19/cmake-3.19.5.tar.gz && tar -xvzf cmake-3.19.5.tar.gz RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential checkinstall zlib1g-dev libssl-dev RUN cd cmake-3.19.5 && ./bootstrap RUN cd cmake-3.19.5 && make -j `nproc` && make install RUN rm -rf cmake-3.19.5.tar.gz #################################################################################################### ############################################## curobo ############################################## #################################################################################################### RUN git clone https://github.com/NVlabs/curobo.git RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections ENV PATH="${PATH}:/opt/hpcx/ompi/bin" ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/hpcx/ompi/lib" ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" RUN cd curobo && python -m pip install -e . --no-build-isolation RUN apt-get install -y git-lfs &&\ cd curobo &&\ git lfs pull #################################################################################################### ############################################## nvblox ############################################## #################################################################################################### RUN git clone https://github.com/valtsblukis/nvblox.git RUN apt-get install -y libgoogle-glog-dev libgtest-dev libgflags-dev python3-dev libsqlite3-dev RUN cd /usr/src/googletest && sudo cmake . && sudo cmake --build . --target install RUN cd nvblox && cd nvblox && mkdir build && cd build && \ cmake .. -DPRE_CXX11_ABI_LINKABLE=ON -DCMAKE_CUDA_FLAGS="-gencode=arch=compute_86,code=sm_86" -DCMAKE_CUDA_ARCHITECTURES="86" -DSTDGPU_CUDA_ARCHITECTURE_FLAGS_USER="86" -DBUILD_FOR_ALL_ARCHS=ON &&\ make -j `nproc` &&\ make install RUN git clone https://github.com/nvlabs/nvblox_torch.git RUN cd nvblox_torch && sh install.sh CMD bash ```

I see you added the fixed_in_next_release tag; should I wait for this release? Any idea when it'll be out?

Anyway, thanks again for your help :pray:

balakumar-s commented 9 months ago

The release will be out by this Wednesday.

Thanks for the dockerfile. I will check to make sure the fix I have works in your docker config as well.

balakumar-s commented 8 months ago

We just pushed a new release that should fix this error. Check this dockerfile for an example on how to compile nvblox: https://github.com/NVlabs/curobo/blob/58958bbcce4f7d8549268040237103ed904db8e4/docker/x86.dockerfile#L103

balakumar-s commented 8 months ago

Closing this due to inactivity. Feel free to open this issue again if the latest release didn't fix it.

cedricgoubard commented 8 months ago

Hi @balakumar-s, Happy new year!

I've taken a look at the new installation instructions; I've managed to build everything in a container without the CXX11_ABI (I'm using a pip-installed pytorch).

However, I'm still facing the same issue: the nvblox_torch installation seems to succeed, but I then get an ModuleNotFoundError.

Dockerfile

```dockerfile FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 # nvidia-container-runtime ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES all ENV LANG C.UTF-8 ENV LC_ALL C.UTF-8 ENV ROS_DISTRO noetic ENV MSCL_LIB_PATH /usr/share/c++-mscl SHELL ["/bin/bash","-c"] ################################################################################################### ########################################## ROS Noetic ############################################# ################################################################################################### # setup timezone RUN echo 'Etc/UTC' > /etc/timezone && \ ln -s /usr/share/zoneinfo/Etc/UTC /etc/localtime && \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends tzdata && \ rm -rf /var/lib/apt/lists/* # install packages RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends \ dirmngr \ gnupg2 \ && rm -rf /var/lib/apt/lists/* # setup the sources list RUN sh -c 'echo "deb http://packages.ros.org/ros/ubuntu focal main" > /etc/apt/sources.list.d/ros-latest.list' # setup keys RUN apt-key adv --keyserver 'hkp://keyserver.ubuntu.com:80' --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 # built-in packages RUN apt-get update \ && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ros-noetic-ros-base \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN echo "source /opt/ros/noetic/setup.bash" >> /root/.bashrc RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ python3 \ python3-pip \ python3-rosdep \ python3-rosinstall \ python3-rosinstall-generator \ python3-wstool \ python-is-python3 \ python3-catkin-tools \ # Basic utilities iputils-ping \ wget \ # ROS packages ros-$ROS_DISTRO-cv-bridge \ ros-$ROS_DISTRO-tf2-tools \ ros-$ROS_DISTRO-tf \ ros-${ROS_DISTRO}-xacro \ ros-${ROS_DISTRO}-joint-state-publisher \ ros-${ROS_DISTRO}-robot-state-publisher \ ros-${ROS_DISTRO}-rviz \ build-essential \ --no-install-recommends \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN DEBIAN_FRONTEND=noninteractive rosdep init && DEBIAN_FRONTEND=noninteractive rosdep update # Create local catkin ws ENV ROS_WS=/ros_ws RUN mkdir -p $ROS_WS/src # Set the working directory to /catkin_ws WORKDIR $ROS_WS RUN source /opt/ros/$ROS_DISTRO/setup.bash \ && catkin init \ && catkin config --cmake-args -DCMAKE_BUILD_TYPE=Release \ && catkin build RUN echo 'source /opt/ros/${ROS_DISTRO}/setup.bash' >> /root/.bashrc ################################################################################################### ############################################ PyTorch ############################################## ################################################################################################### RUN pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117 ################################################################################################### ##################################### CuRobo prerequisites ######################################## ################################################################################################### RUN apt-get update -y && apt-get install -y --no-install-recommends git vim python-is-python3 RUN python -m pip install --upgrade pip && \ python -m pip install --upgrade setuptools numpy transforms3d torchtyping typing-extensions WORKDIR /curobo ENV TORCH_CUDA_ARCH_LIST "8.6+PTX" ################################################################################################### ######################################### CMake version ########################################### ################################################################################################### RUN wget https://cmake.org/files/v3.28/cmake-3.28.1.tar.gz && tar -xvzf cmake-3.28.1.tar.gz RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential checkinstall zlib1g-dev libssl-dev RUN cd cmake-3.28.1 && ./bootstrap RUN cd cmake-3.28.1 && make -j `nproc` && make install RUN rm -rf cmake-3.28.1.tar.gz #################################################################################################### ############################################## curobo ############################################## #################################################################################################### RUN apt-get update && apt-get install -y --no-install-recommends \ pkg-config \ libglvnd-dev \ libgl1-mesa-dev \ libegl1-mesa-dev \ libgles2-mesa-dev RUN apt-get update && apt-get install --reinstall -y \ libmpich-dev \ hwloc-nox libmpich12 mpich RUN pip install "robometrics[evaluator] @ git+https://github.com/fishbotics/robometrics.git" RUN git clone https://github.com/NVlabs/curobo.git RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections ENV PATH="${PATH}:/opt/hpcx/ompi/bin" ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/hpcx/ompi/lib" ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" RUN cd curobo && pip install .[dev,usd] --no-build-isolation RUN apt-get install -y git-lfs &&\ cd curobo &&\ git lfs pull #################################################################################################### ############################################## nvblox ############################################## #################################################################################################### # Required for EGL rendering ENV PYOPENGL_PLATFORM=egl RUN echo '{"file_format_version": "1.0.0", "ICD": {"library_path": "libEGL_nvidia.so.0"}}' >> /usr/share/glvnd/egl_vendor.d/10_nvidia.json ENV TORCH_CXX11=0 ENV PKGS_PATH=/curobo RUN apt-get install -y tcl RUN cd ${PKGS_PATH} && git clone https://github.com/sqlite/sqlite.git -b version-3.39.4 && \ cd ${PKGS_PATH}/sqlite && CFLAGS=-fPIC ./configure --prefix=${PKGS_PATH}/sqlite/install/ && \ make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/google/glog.git -b v0.6.0 && \ cd glog && \ mkdir build && cd build && \ cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON \ -DCMAKE_INSTALL_PREFIX=${PKGS_PATH}/glog/install/ \ -DWITH_GFLAGS=OFF -DWITH_GTEST=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=${TORCH_CXX11} \ && make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/gflags/gflags.git -b v2.2.2 && \ cd gflags && \ mkdir build && cd build && \ cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON \ -DCMAKE_INSTALL_PREFIX=${PKGS_PATH}/gflags/install/ \ -DGFLAGS_BUILD_STATIC_LIBS=ON -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=${TORCH_CXX11} \ && make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/valtsblukis/nvblox.git && cd ${PKGS_PATH}/nvblox/nvblox && mkdir build && cd build && \ cmake .. -DBUILD_REDISTRIBUTABLE=ON \ -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -DPRE_CXX11_ABI_LINKABLE=ON \ -DSQLITE3_BASE_PATH="${PKGS_PATH}/sqlite/install/" -DGLOG_BASE_PATH="${PKGS_PATH}/glog/install/" \ -DGFLAGS_BASE_PATH="${PKGS_PATH}/gflags/install/" -D_GLIBCXX_USE_CXX11_ABI=0 \ -DCMAKE_CUDA_FLAGS="-gencode=arch=compute_86,code=sm_86" -DCMAKE_CUDA_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -DCMAKE_CUDA_ARCHITECTURES="86" -DSTDGPU_CUDA_ARCHITECTURE_FLAGS_USER="86" && \ make -j `nproc` && \ sudo make install # We have to help CMake find glog here RUN cd ${PKGS_PATH} && git clone https://github.com/NVlabs/nvblox_torch.git && cd nvblox_torch && git checkout 0463d9f &&\ sed -i "71i link_directories(/curobo/glog/install/lib)" nvblox_torch/cpp/CMakeLists.txt &&\ sh install.sh "$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)');/curobo/glog/install" &&\ python -m pip install -e . CMD bash ```

The error:

python -c 'from nvblox_torch.mapper import Mapper'

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'nvblox_torch'

I've considered using your container as a base image, but I need to install ROS noetic on top, and it does not officially support Ubuntu 23. I'll try to find a workaround if this does not work out, but any advice would be greatly appreciated :pray:.

Sepirence commented 8 months ago

Similar issue I have while executing motion_gen_reacher_nvblox.py. My setup is

Starts from docker isaac-sim:2023.1.0 python 3.10.13 torch 2.0.1+cu118 cuda 11.7 gpu 3090ti ubuntu 20.04

I installed nvblox with your guide. There was no problem on motion_gen_reacher.py before and after installation nvblox, but motion_gen_reacher_nvblox.py gets problem after starting simulation. I think warming up stage does not get problem, but motion generation makes cuda memory error.

error messege:

2024-01-10 07:08:25 [46,145ms] [Warning] [omni.physx.plugin] The rigid body at /World/panda/base_link has a possibly invalid inertia tensor of {1.0, 1.0, 1.0} and a negative mass, small sphere approximated inertia was used. Either specify correct values in the mass properties, or add collider(s) to any shape(s) that you wish to automatically compute mass properties for.
2024-01-10 07:08:25 [46,145ms] [Warning] [omni.physx.plugin] The rigid body at /World/panda/ee_link has a possibly invalid inertia tensor of {1.0, 1.0, 1.0} and a negative mass, small sphere approximated inertia was used. Either specify correct values in the mass properties, or add collider(s) to any shape(s) that you wish to automatically compute mass properties for.
2024-01-10 07:08:25 [46,302ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-01-10 07:08:25 [46,302ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-01-10 07:08:25 [46,357ms] [Warning] [omni.physx.plugin] The rigid body at /World/panda/base_link has a possibly invalid inertia tensor of {1.0, 1.0, 1.0} and a negative mass, small sphere approximated inertia was used. Either specify correct values in the mass properties, or add collider(s) to any shape(s) that you wish to automatically compute mass properties for.
2024-01-10 07:08:25 [46,357ms] [Warning] [omni.physx.plugin] The rigid body at /World/panda/ee_link has a possibly invalid inertia tensor of {1.0, 1.0, 1.0} and a negative mass, small sphere approximated inertia was used. Either specify correct values in the mass properties, or add collider(s) to any shape(s) that you wish to automatically compute mass properties for.
2024-01-10 07:08:32 [53,513ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-01-10 07:08:32 [53,513ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-01-10 07:08:32 [53,517ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-01-10 07:08:32 [53,517ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-01-10 07:08:32 [53,519ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-01-10 07:08:32 [53,519ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-01-10 07:08:32 [53,662ms] [Warning] [carb] Client omni.ui has acquired [omni::kit::renderer::IRenderer v1.9] 100 times. Consider accessing this interface with carb::getCachedInterface() (Performance warning)
2024-01-10 07:08:32 [53,662ms] [Warning] [carb] Client omni.ui has acquired [carb::svg::Svg v0.1] 100 times. Consider accessing this interface with carb::getCachedInterface() (Performance warning)
Traceback (most recent call last):
  File "/isaac-sim/curobo/examples/isaac_sim/motion_gen_reacher_nvblox.py", line 294, in <module>
    main()
  File "/isaac-sim/curobo/examples/isaac_sim/motion_gen_reacher_nvblox.py", line 249, in main
    result = motion_gen.plan_single(cu_js.unsqueeze(0), ik_goal, plan_config)
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1224, in plan_single
    result = self._plan_attempts(
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1043, in _plan_attempts
    result = self._plan_from_solve_state(
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1523, in _plan_from_solve_state
    traj_result = self._solve_trajopt_from_solve_state(
  File "/isaac-sim/kit/python/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/motion_gen.py", line 912, in _solve_trajopt_from_solve_state
    traj_result = trajopt_instance.solve_any(
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/trajopt.py", line 492, in solve_any
    return self.solve_single(
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/trajopt.py", line 613, in solve_single
    return self.solve_from_solve_state(
  File "/isaac-sim/curobo/src/curobo/wrap/reacher/trajopt.py", line 579, in solve_from_solve_state
    result = self.solver.solve(goal_buffer, seed_traj)
  File "/isaac-sim/curobo/src/curobo/wrap/wrap_base.py", line 141, in solve
    act_seq = self.optimize(seed, shift_steps=0)
  File "/isaac-sim/curobo/src/curobo/wrap/wrap_base.py", line 71, in optimize
    act_seq = opt.optimize(act_seq, shift_steps)
  File "/isaac-sim/curobo/src/curobo/opt/opt_base.py", line 98, in optimize
    torch.cuda.synchronize()
  File "/isaac-sim/extscache/omni.pip.torch-2_0_1-2.0.2+105.1.lx64/torch-2-0-1/torch/cuda/__init__.py", line 688, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-01-10 07:08:40 [61,299ms] [Error] [carb.livestream.plugin] nvstPushStreamData error for eye 0, stream 0x7f3714018220: 0x800b0000
2024-01-10 07:08:40 [61,303ms] [Warning] [omni.core.ITypeFactory] Module /isaac-sim/kit/exts/omni.graph.action/bin/libomni.graph.action.plugin.so remained loaded after unload request.
2024-01-10 07:08:40 [61,307ms] [Warning] [carb] [Plugin: omni.spectree.delegate.plugin] Module /isaac-sim/kit/exts/omni.usd_resolver/bin/libomni.spectree.delegate.plugin.so remained loaded after unload request
2024-01-10 07:08:40 [61,311ms] [Warning] [omni.stageupdate.plugin] Deprecated: direct use of IStageUpdate callbacks is deprecated. Use IStageUpdate::getStageUpdate instead.
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,312ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,313ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,313ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-01-10 07:08:40 [61,316ms] [Warning] [carb.audio.context] 1 contexts were leaked
2024-01-10 07:08:40 [61,392ms] [Warning] [carb] Recursive unloadAllPlugins() detected!
2024-01-10 07:08:40 [61,403ms] [Warning] [omni.core.ITypeFactory] Module /isaac-sim/kit/exts/omni.activity.core/bin/libomni.activity.core.plugin.so remained loaded after unload request.

Is there any advice that I can try?

balakumar-s commented 8 months ago

@Sepirence Can you create a new issue to track your error? We pushed a fix to nvblox_torch which might fix your error. Can you pull the latest commit from nvblox_torch repo?

balakumar-s commented 8 months ago

@cedricgoubard I was able to fix the issue. You docker was unable to find python packages when installed in developer mode with pip install -e . which we use for installing nvblox_torch.

I made changes to nvblox_torch repo that allows for pip install . to work.

Here is the dockerfile with modifications:

Dockerfile

``` FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 # nvidia-container-runtime ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES all ENV LANG C.UTF-8 ENV LC_ALL C.UTF-8 ENV ROS_DISTRO noetic ENV MSCL_LIB_PATH /usr/share/c++-mscl SHELL ["/bin/bash","-c"] ################################################################################################### ########################################## ROS Noetic ############################################# ################################################################################################### # setup timezone RUN echo 'Etc/UTC' > /etc/timezone && \ ln -s /usr/share/zoneinfo/Etc/UTC /etc/localtime && \ apt-get update && \ DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends tzdata && \ rm -rf /var/lib/apt/lists/* # install packages RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -q -y --no-install-recommends \ dirmngr \ gnupg2 \ && rm -rf /var/lib/apt/lists/* # setup the sources list RUN sh -c 'echo "deb http://packages.ros.org/ros/ubuntu focal main" > /etc/apt/sources.list.d/ros-latest.list' # setup keys RUN apt-key adv --keyserver 'hkp://keyserver.ubuntu.com:80' --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 # built-in packages RUN apt-get update \ && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ros-noetic-ros-base \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN echo "source /opt/ros/noetic/setup.bash" >> /root/.bashrc RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ python3 \ python3-pip \ python3-rosdep \ python3-rosinstall \ python3-rosinstall-generator \ python3-wstool \ python-is-python3 \ python3-catkin-tools \ # Basic utilities iputils-ping \ wget \ # ROS packages ros-$ROS_DISTRO-cv-bridge \ ros-$ROS_DISTRO-tf2-tools \ ros-$ROS_DISTRO-tf \ ros-${ROS_DISTRO}-xacro \ ros-${ROS_DISTRO}-joint-state-publisher \ ros-${ROS_DISTRO}-robot-state-publisher \ ros-${ROS_DISTRO}-rviz \ build-essential \ --no-install-recommends \ && apt-get autoclean \ && apt-get autoremove \ # Clear apt-cache to reduce image size && rm -rf /var/lib/apt/lists/* RUN DEBIAN_FRONTEND=noninteractive rosdep init && DEBIAN_FRONTEND=noninteractive rosdep update # Create local catkin ws ENV ROS_WS=/ros_ws RUN mkdir -p $ROS_WS/src # Set the working directory to /catkin_ws WORKDIR $ROS_WS RUN source /opt/ros/$ROS_DISTRO/setup.bash \ && catkin init \ && catkin config --cmake-args -DCMAKE_BUILD_TYPE=Release \ && catkin build RUN echo 'source /opt/ros/${ROS_DISTRO}/setup.bash' >> /root/.bashrc ################################################################################################### ############################################ PyTorch ############################################## ################################################################################################### RUN pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117 ################################################################################################### ##################################### CuRobo prerequisites ######################################## ################################################################################################### RUN apt-get update -y && apt-get install -y --no-install-recommends git vim python-is-python3 RUN python -m pip install --upgrade pip && \ python -m pip install --upgrade setuptools numpy transforms3d torchtyping typing-extensions WORKDIR /curobo ENV TORCH_CUDA_ARCH_LIST "8.6+PTX" ################################################################################################### ######################################### CMake version ########################################### ################################################################################################### RUN wget https://cmake.org/files/v3.28/cmake-3.28.1.tar.gz && tar -xvzf cmake-3.28.1.tar.gz RUN DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential checkinstall zlib1g-dev libssl-dev RUN cd cmake-3.28.1 && ./bootstrap RUN cd cmake-3.28.1 && make -j `nproc` && make install RUN rm -rf cmake-3.28.1.tar.gz #################################################################################################### ############################################## curobo ############################################## #################################################################################################### RUN apt-get update && apt-get install -y --no-install-recommends \ pkg-config \ libglvnd-dev \ libgl1-mesa-dev \ libegl1-mesa-dev \ libgles2-mesa-dev RUN apt-get update && apt-get install --reinstall -y \ libmpich-dev \ hwloc-nox libmpich12 mpich RUN pip install "robometrics[evaluator] @ git+https://github.com/fishbotics/robometrics.git" RUN apt-get install -y git-lfs RUN git clone https://github.com/NVlabs/curobo.git RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections ENV PATH="${PATH}:/opt/hpcx/ompi/bin" ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/hpcx/ompi/lib" ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}" RUN cd curobo && pip install .[dev,usd] --no-build-isolation #################################################################################################### ############################################## nvblox ############################################## #################################################################################################### # Required for EGL rendering ENV PYOPENGL_PLATFORM=egl RUN echo '{"file_format_version": "1.0.0", "ICD": {"library_path": "libEGL_nvidia.so.0"}}' >> /usr/share/glvnd/egl_vendor.d/10_nvidia.json ENV TORCH_CXX11=0 ENV PKGS_PATH=/curobo RUN apt-get install -y tcl RUN cd ${PKGS_PATH} && git clone https://github.com/sqlite/sqlite.git -b version-3.39.4 && \ cd ${PKGS_PATH}/sqlite && CFLAGS=-fPIC ./configure --prefix=${PKGS_PATH}/sqlite/install/ && \ make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/google/glog.git -b v0.6.0 && \ cd glog && \ mkdir build && cd build && \ cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON \ -DCMAKE_INSTALL_PREFIX=${PKGS_PATH}/glog/install/ \ -DWITH_GFLAGS=OFF -DWITH_GTEST=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=${TORCH_CXX11} \ && make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/gflags/gflags.git -b v2.2.2 && \ cd gflags && \ mkdir build && cd build && \ cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON \ -DCMAKE_INSTALL_PREFIX=${PKGS_PATH}/gflags/install/ \ -DGFLAGS_BUILD_STATIC_LIBS=ON -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=${TORCH_CXX11} \ && make -j `nproc` && make install RUN cd ${PKGS_PATH} && git clone https://github.com/valtsblukis/nvblox.git && cd ${PKGS_PATH}/nvblox/nvblox && mkdir build && cd build && \ cmake .. -DBUILD_REDISTRIBUTABLE=ON \ -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -DPRE_CXX11_ABI_LINKABLE=ON \ -DSQLITE3_BASE_PATH="${PKGS_PATH}/sqlite/install/" -DGLOG_BASE_PATH="${PKGS_PATH}/glog/install/" \ -DGFLAGS_BASE_PATH="${PKGS_PATH}/gflags/install/" -D_GLIBCXX_USE_CXX11_ABI=0 \ -DCMAKE_CUDA_FLAGS="-gencode=arch=compute_86,code=sm_86" -DCMAKE_CUDA_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -DCMAKE_CUDA_ARCHITECTURES="86" -DSTDGPU_CUDA_ARCHITECTURE_FLAGS_USER="86" && \ make -j `nproc` && \ sudo make install ENV CACHE_DATE=11 # to make sure docker pulls from network instead of cache # We have to help CMake find glog here RUN cd ${PKGS_PATH} && git clone https://github.com/NVlabs/nvblox_torch.git && cd nvblox_torch &&\ sed -i "71i link_directories(/curobo/glog/install/lib)" src/ nvblox_torch/cpp/CMakeLists.txt &&\ sh install.sh "$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)');${PKGS_PATH}/glog/install" &&\ python -m pip install . CMD bash ```

cedricgoubard commented 7 months ago

Hi @balakumar-s, Your fix mostly worked, thank you! I just had to switch back to editable mode, otherwise libpy_nvblox.so was not found in /usr/local/lib/python3.8/dist-packages/nvblox_torch/cpp/build/ when actually using the package.

I can load everything properly, I'll test connecting a depth camera next week.

Thank you again for your help!