ethz-asl / vgn

Real-time 6 DOF grasp detection in clutter.
BSD 3-Clause "New" or "Revised" License
251 stars 56 forks source link

Error scripts/train_vgn.py [IndexError: index 42 is out of bounds for dimension 2 with size 40] #22

Closed Tatsuya-2 closed 2 years ago

Tatsuya-2 commented 2 years ago

Hi! Thank you for the great research! I am trying to execute the network training. However, I met a strange error when I run the scripts/train_vgn.py.

System information OS: Ubuntu 20.04 ROS1 noetic

Version CUDA: 11.2.2 or 11.7

Click to toggle contents of pip list ```bash Package Version Editable project location ------------------------------------ ---------------- --------------------------------------------------------- absl-py 1.1.0 action-tutorials-py 0.9.3 addict 2.4.0 aiohttp 3.8.1 aiosignal 1.2.0 ament-copyright 0.9.6 ament-cppcheck 0.9.6 ament-cpplint 0.9.6 ament-flake8 0.9.6 ament-index-python 1.1.0 ament-lint 0.9.6 ament-lint-cmake 0.9.6 ament-package 0.9.5 ament-pep257 0.9.6 ament-uncrustify 0.9.6 ament-xmllint 0.9.6 apptools 5.1.0 argcomplete 2.0.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 async-timeout 4.0.2 atomicwrites 1.4.1 attrs 21.4.0 autopep8 1.6.0 backcall 0.2.0 backports.functools-lru-cache 1.6.4 backports.zoneinfo 0.2.1 beautifulsoup4 4.11.1 black 22.6.0 bleach 5.0.1 blinker 1.4 brotlipy 0.7.0 bs4 0.0.1 cachetools 5.0.0 catkin-pkg 0.5.2 certifi 2022.6.15 cffi 1.15.1 charset-normalizer 2.1.0 click 8.1.3 cloudpickle 2.1.0 colorama 0.4.5 commonmark 0.9.1 configobj 5.0.6 cryptography 37.0.4 cycler 0.11.0 cytoolz 0.12.0 dask 2022.7.1 dataclasses 0.8 debugpy 1.6.0 decorator 5.1.1 defusedxml 0.7.1 demo-nodes-py 0.9.3 docutils 0.19 domain-coordinator 0.9.2 empy 3.3.4 entrypoints 0.4 envisage 6.0.1 examples-rclpy-executors 0.9.4 examples-rclpy-minimal-action-client 0.9.4 examples-rclpy-minimal-action-server 0.9.4 examples-rclpy-minimal-client 0.9.4 examples-rclpy-minimal-publisher 0.9.4 examples-rclpy-minimal-service 0.9.4 examples-rclpy-minimal-subscriber 0.9.4 executing 0.8.3 fastjsonschema 2.16.1 flake8 4.0.1 flake8-blind-except 0.2.1 flake8-builtins 1.5.3 flake8-class-newline 1.6.0 flake8-comprehensions 3.10.0 flake8-deprecated 1.3 flake8-docstrings 1.6.0 flake8-import-order 0.18.1 flake8-quotes 3.3.1 flit_core 3.7.1 fonttools 4.34.4 frozenlist 1.3.0 fsspec 2022.5.0 future 0.18.2 google-auth 2.9.1 google-auth-oauthlib 0.4.6 grpcio 1.46.3 idna 3.3 imagecodecs 2022.2.22 imageio 2.19.3 importlib-metadata 4.11.4 importlib-resources 5.8.0 iniconfig 1.1.1 ipykernel 6.15.1 ipython 8.4.0 ipython-genutils 0.2.0 ipywidgets 7.7.1 jedi 0.18.1 Jinja2 3.1.2 joblib 1.1.0 Js2Py 0.71 jsonschema 4.7.2 jupyter 1.0.0 jupyter-client 7.3.4 jupyter-console 6.4.4 jupyter_core 4.11.1 jupyterlab-pygments 0.2.2 jupyterlab-widgets 1.1.1 kiwisolver 1.4.4 lark 1.1.2 launch 0.10.8 launch-ros 0.11.6 launch-testing 0.10.8 launch-testing-ros 0.11.6 launch-xml 0.10.8 launch-yaml 0.10.8 lightgbm 3.3.2 locket 1.0.0 loguru 0.6.0 Markdown 3.4.1 MarkupSafe 2.1.1 matplotlib 3.5.2 matplotlib-inline 0.1.3 mayavi 4.8.0 mccabe 0.6.1 mistune 0.8.4 mpi4py 3.1.3 multidict 6.0.2 munkres 1.1.4 mypy-extensions 0.4.3 nbclient 0.6.6 nbconvert 6.5.0 nbformat 5.4.0 nest-asyncio 1.5.5 networkx 2.8.5 notebook 6.4.12 numpy 1.20.2 oauthlib 3.2.0 open3d 0.12.0 open3d-python 0.3.0.0 open3d-ros-helper 0.2.0.3 osrf-pycommon 0.1.11 packaging 21.3 pandas 1.4.3 pandocfilters 1.5.0 parso 0.8.3 partd 1.2.0 pathspec 0.9.0 patsy 0.5.2 pep8 1.7.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.2.0 pip 22.2 pip-search 0.0.12 platformdirs 2.5.2 pluggy 1.0.0 plyfile 0.7.4 prometheus-client 0.14.1 prompt-toolkit 3.0.30 protobuf 3.16.0 psutil 5.9.1 ptyprocess 0.7.0 pure-eval 0.2.2 py 1.11.0 pyasn1 0.4.8 pyasn1-modules 0.2.7 pybullet 2.7.9 pycodestyle 2.8.0 pycparser 2.21 pydocstyle 6.1.1 pyface 7.4.2 pyflakes 2.4.0 Pygments 2.12.0 pyjsparser 2.7.1 PyJWT 2.4.0 pyOpenSSL 22.0.0 pyparsing 3.0.9 PyQt5 5.12.3 PyQt5_sip 4.19.18 PyQtChart 5.12 PyQtWebEngine 5.12.1 pyrsistent 0.18.1 PySocks 1.7.1 pytest 7.1.2 pytest-repeat 0.9.1 pytest-rerunfailures 10.2 python-dateutil 2.8.2 pytorch-ignite 0.4.4 pytz 2022.1 pytz-deprecation-shim 0.1.0.post0 pyu2f 0.1.5 PyWavelets 1.3.0 PyYAML 6.0 pyzmq 23.2.0 qtconsole 5.3.1 QtPy 2.1.0 quality-of-service-demo-py 0.9.3 regex 2022.7.9 requests 2.28.1 requests-oauthlib 1.3.1 rich 12.5.1 robot-helpers 0.0.0 /media/tatsuya/disk2/home/ROS2/ws_vgn/build/robot_helpers ros2action 0.9.11 ros2bag 0.3.9 ros2cli 0.9.11 ros2component 0.9.11 ros2doctor 0.9.11 ros2interface 0.9.11 ros2launch 0.11.6 ros2lifecycle 0.9.11 ros2multicast 0.9.11 ros2node 0.9.11 ros2param 0.9.11 ros2pkg 0.9.11 ros2run 0.9.11 ros2service 0.9.11 ros2topic 0.9.11 rosidl-runtime-py 0.9.1 rpyutils 0.2.0 rqt-action 0.4.9 rqt-graph 1.1.3 rqt-gui 1.1.2 rqt-gui-py 1.1.2 rqt-msg 1.0.5 rqt-plot 1.1.1 rqt-publisher 1.1.3 rqt-py-console 1.0.2 rqt-reconfigure 1.0.8 rqt-service-caller 1.0.5 rqt-shell 1.0.2 rqt-srv 1.0.3 rqt-top 1.0.2 rqt-topic 1.2.2 rsa 4.9 scikit-image 0.19.3 scikit-learn 1.1.1 scipy 1.6.3 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 63.2.0 six 1.16.0 sklearn 0.0 snowballstemmer 2.2.0 soupsieve 2.3.2.post1 sros2 0.9.5 stack-data 0.3.0 statsmodels 0.13.2 teleop-twist-keyboard 2.3.2 tensorboard 2.9.1 tensorboard-data-server 0.6.0 tensorboard-plugin-wit 1.8.1 terminado 0.15.0 threadpoolctl 3.1.0 tifffile 2022.5.4 tinycss2 1.1.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 topic-monitor 0.9.3 torch 1.8.0.post3 torchvision 0.12.0a0+da433bf tornado 6.2 tqdm 4.64.0 traitlets 5.3.0 traits 6.3.2 traitsui 7.4.0 typed-ast 1.5.4 typing_extensions 4.3.0 tzdata 2022.1 tzlocal 4.2 unicodedata2 14.0.0 urllib3 1.26.10 vgn-node 0.0.0 /media/tatsuya/disk2/home/ROS2/ws_vgn/build/vgn_node vtk 9.1.0 wcwidth 0.2.5 webencodings 0.5.1 Werkzeug 2.1.2 wheel 0.37.1 widgetsnbextension 3.6.1 xgboost 1.5.1 yarl 1.7.2 zipp 3.8.0 ```

Steps to reproduce the issue

  1. Setup
    • Case: 1 Install readme instructions.
  1. Run
    $ python3 scripts/train_vgn.py --dataset data/datasets/foo

    I received below error when I run.

    Click to toggle contents of Error message
INFO - 2022-07-25 06:48:56,235 - engine - Engine run starting with max_epochs=30.
Epoch [1/30]: [8/263]   3%|##7                                                                                        [00:50<30:44]ERROR - 2022-07-25 06:49:58,002 - engine - Current run is terminating due to exception: index 42 is out of bounds for dimension 2 with size 40
ERROR - 2022-07-25 06:49:58,009 - engine - Engine run is terminating due to exception: index 42 is out of bounds for dimension 2 with size 40
Traceback (most recent call last):
  File "scripts/train_vgn.py", line 217, in <module>
    main(args)
  File "scripts/train_vgn.py", line 87, in main
    trainer.run(train_loader, max_epochs=args.epochs)
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 702, in run
    return self._internal_run()
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 775, in _internal_run
    self._handle_exception(e)
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 745, in _internal_run
    time_taken = self._run_once_on_dataset()
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 850, in _run_once_on_dataset
    self._handle_exception(e)
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/home/tatsuya/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 833, in _run_once_on_dataset
    self.state.output = self._process_function(self, self.state.batch)
  File "scripts/train_vgn.py", line 161, in _update
    y_pred = select(net(x), index)
  File "scripts/train_vgn.py", line 120, in select
    label = qual_out[batch_index, :, index[:, 0], index[:, 1], index[:, 2]].squeeze()
IndexError: index 42 is out of bounds for dimension 2 with size 40
Epoch [1/30]: [8/263]   3%|##7   

I might think it caused by memory issue due to some lib version. Is it possible to teach me some information about the environment in which it worked properly? ex. pip list and cuda version ...etc.

Thanks in advance!

Tatsuya-2 commented 2 years ago

If you need a new environment for verification, I hope the following files are helpful.

README.md # Creating a CUDA-enabled ROS environment ## Build ```bash # ROS1 noetic $ docker build --build-arg ROS_DISTRO=noetic --build-arg UBUNTU_VERSION=20.04 -t ros1-noetic:cuda-11.2.2 . ``` ## Run ```bash docker-compose -f docker-compose.yaml up ```
Dockerfile ``` ## CUDA on amd64 ## Default value ARG UBUNTU_VERSION=20.04 ARG ROS_DISTRO=noetic ## Base image ARG BASE_IMAGE="nvidia/cuda:11.2.2-base-ubuntu${UBUNTU_VERSION}" FROM ${BASE_IMAGE} as system ## Disable interactive mode ENV DEBIAN_FRONTEND=noninteractive ARG UBUNTU_VERSION ARG ROS_DISTRO ARG USERNAME=developer ARG HOME=/home/${USERNAME} ARG GID=1000 ARG UID=1000 ## Check ROS_DISTRO RUN if [ "${UBUNTU_VERSION}" = "20.04" ] && [ "${ROS_DISTRO}" = "foxy" ]; then \ echo "Version check OK."; \ elif [ "${UBUNTU_VERSION}" = "20.04" ] && [ "${ROS_DISTRO}" = "noetic" ]; then \ echo "Version check OK."; \ elif [ "${UBUNTU_VERSION}" = "18.04" ] && [ "${ROS_DISTRO}" = "melodic" ]; then \ echo "Version check OK."; \ else \ echo "Error! UBUNTU_VERSION and ROS_DISTRO does not match."; \ exit; \ fi RUN echo "Selected Ubuntu version:${UBUNTU_VERSION}" RUN echo "Selected ROS version:${ROS_DISTRO}" # RUN curl -sSL http://get.gazebosim.org | sh RUN apt-get update && apt-get -yq upgrade && \ apt-get install -yq -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" keyboard-configuration RUN apt-get install -yq wget curl git build-essential sudo lsb-release locales bash-completion tzdata python3-pip \ && rm -rf /var/lib/apt/lists/* ## ROS Install ADD ros_install.sh /tmp/ros_install.sh RUN /tmp/ros_install.sh RUN groupadd -f -g ${GID} ${USERNAME} && \ useradd -m -s /bin/bash -u ${UID} -g ${GID} -G sudo ${USERNAME} && \ echo ${USERNAME}:${USERNAME} | chpasswd && \ echo "${USERNAME} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers RUN chown -R ${UID}:${GID} ${HOME} USER ${USERNAME} WORKDIR ${HOME} RUN echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> ${HOME}/.bashrc ```
ros_install.sh ```bash #!/bin/bash if [ "${ROS_DISTRO}" = "noetic" ] || [ "${ROS_DISTRO}" = "melodic" ]; then echo "ROS1 noetic install" sh -c 'echo "deb http://packages.ros.org/ros/ubuntu $(lsb_release -sc) main" > /etc/apt/sources.list.d/ros-latest.list' apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F42ED6FBAB17C654 apt-get update && apt-get install -yq curl curl -s https://raw.githubusercontent.com/ros/rosdistro/master/ros.asc | apt-key add - apt-get update && apt-get -yq upgrade apt-get install -yq ros-${ROS_DISTRO}-desktop-full apt-get install -yq python3-rosinstall python3-rosinstall-generator python3-wstool build-essential python3-catkin-tools apt-get install -yq python3-rosdep elif [ "${ROS_DISTRO}" = "foxy" ]; then echo "ROS2 foxy install" apt-get update && apt-get install locales && locale-gen en_US en_US.UTF-8 update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 && export LANG=en_US.UTF-8 apt-get update && apt-get install -yq curl gnupg2 lsb-release curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(source /etc/os-release && echo $UBUNTU_CODENAME) main" | tee /etc/apt/sources.list.d/ros2.list >/dev/null apt-get update && apt-get -yq upgrade apt-get install -yq ros-${ROS_DISTRO}-desktop apt-get install -yq python3-colcon-common-extensions python3-argcomplete apt-get install -yq python3-rosdep else echo "Error!! No matching ROS_DISTRO in ros_install.sh" fi ```
docker-compose.yaml ```yml version: '3' services: app: image: ros1-noetic:cuda-11.2.2 hostname: app security_opt: - seccomp:unconfined deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [ gpu ] environment: - "DISPLAY=$DISPLAY" - "ROS_HOSTNAME=app" - "ROS_MASTER_URI=http://ros-master:11311" volumes: - type: bind source: ${HOME}/ws_vgn target: ${HOME}/ws_vgn entrypoint: "bash -c" command: > "source /opt/ros/noetic/setup.bash && tail -f /dev/null" restart: always ```
mbreyer commented 2 years ago

Hi Tatsuya-2, thanks a lot for the detailed description. This error looks like a simple "out-of-bounds" error. This can happen when a sampled grasp lies outside the volume covered by the TSDF. This notebook contains a block of code to remove these configurations before training.

I hope that fixes the problem.

aniketghodake10 commented 2 years ago

If you need a new environment for verification, I hope the following files are helpful.

README.md Dockerfile ros_install.sh docker-compose.yaml

The issue is not related to environment or installation.

If you look into the code, you can see that point cloud is converted into TSDF volume index by dividing the point by voxel size and then rounding it to INTEGER. Our TSDF volume is of size 404040. But either due to rounding error or due to points outside the workspace, it can give the index greater than 40.

What you can do is, Add the condition to check whether the index value is between [0, 39]

Tatsuya-2 commented 2 years ago

@mbreyer @aniketghodake10 Thank you very much for the quick reply. I understand the cause by hearing from you.

Thank you for your cooperation!