TRI-ML / RAP

This is the official code for the paper RAP: Risk-Aware Prediction for Robust Planning: https://arxiv.org/abs/2210.01368
Other
34 stars 7 forks source link

No .ckpt file #1

Open YanzeZhang97 opened 1 month ago

YanzeZhang97 commented 1 month ago

Dear authors,

Hope this message finds you well!

I ran your code for the didactic training using the wandb. However, after finishing the traning (finishing traning_didactic.py), I indeed saw the log dir but I did not see the .ckpt file. The file tree is like below. Could you please help to give me more guidance?

image Thanks, Max

hadar-hai commented 1 month ago

Dear authors, I’ve also tried everything and still no .ckpt file is being created. We would really appreciate your help.

Thanks, Hadar

hadar-hai commented 1 month ago

Hi, It could be connected to this change:

Here they talk about it: https://github.com/Lightning-AI/pytorch-lightning/pull/16520

-    def training_epoch_end(self, outputs):
-        epoch_average = torch.stack([output["loss"] for output in outputs]).mean()
+    def on_train_epoch_end(self):
+        epoch_average = torch.stack(self.training_step_outputs).mean()
         self.log("training_epoch_average", epoch_average)
+        self.training_step_outputs.clear()  # free memory

Thanks, Hadar

HarukiNishimura-TRI commented 1 month ago

Hi @YanzeZhang97 @hadar-hai, thank you for bringing the issue to the attention. Can you try downgrading lighting to v1.8.6 (released Dec 21, 2022) and see if the issue still persists? We have not run the code ourselves for a while and apparently there was a major version change to lightning since we released the code, which might have caused this issue. Since the code is no longer in a status of active development, we would greatly appreciate your contribution to either determine appropriate versions of dependencies or update the code appropriately.

hadar-hai commented 4 weeks ago

@YanzeZhang97 Did Haruki's response help you?

YanzeZhang97 commented 4 weeks ago

I used pytorch-lightning v1.7.7 and successfully got the .ckpt file. But the reason is still not clear.

hadar-hai commented 4 weeks ago

@YanzeZhang97 Could you share your pip list? Also, did you clone the latest version of the code from the repo and run training_didactic.py as is, or did you make any changes to the code? Thank you very much!

YanzeZhang97 commented 4 weeks ago

absl-py 0.15.0 actionlib 1.14.0 addict 2.4.0 aiobotocore 2.13.1 aiofiles 23.2.1 aiohttp 3.9.5 aioitertools 0.11.0 aiosignal 1.2.0 altair 5.3.0 angles 1.9.13 annotated-types 0.6.0 anyio 4.3.0 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 astor 0.8.1 asttokens 2.2.1 astunparse 1.6.3 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.2.0 Babel 2.15.0 backcall 0.2.0 base_local_planner 1.17.3 beautifulsoup4 4.12.3 bleach 6.1.0 blessed 1.20.0 blinker 1.6.2 bondpy 1.8.6 boto3 1.34.131 botocore 1.34.131 Brotli 1.0.9 cachetools 5.3.3 camera-calibration 1.17.0 camera-calibration-parsers 1.12.0 casadi 3.6.4 catkin 0.8.10 certifi 2021.5.30 cffi 1.16.0 charset-normalizer 3.3.2 clang 5.0 click 8.1.7 cloudpickle 1.6.0 cmake 3.26.4 comm 0.2.2 contourpy 1.1.1 controller-manager 0.20.0 controller-manager-msgs 0.20.0 croniter 1.3.15 cryptography 42.0.5 cv-bridge 1.16.2 cvxopt 1.3.2 cycler 0.12.1 debtcollector 2.5.0 debugpy 1.8.1 deepbots 1.0.0 deepdiff 7.0.1 diagnostic-analysis 1.11.0 diagnostic-common-diagnostics 1.11.0 diagnostic-updater 1.11.0 dnspython 2.6.1 do-mpc 4.6.4 docker-pycreds 0.4.0 dynamic-reconfigure 1.7.3 editor 1.6.6 einops 0.8.0 email_validator 2.1.1 exceptiongroup 1.2.1 executing 1.2.0 Farama-Notifications 0.0.4 fastapi 0.111.0 fastapi-cli 0.0.3 fastjsonschema 2.19.1 ffmpy 0.3.2 filelock 3.12.2 fire 0.6.0 flatbuffers 1.12 fonttools 4.40.0 frozenlist 1.4.0 fsspec 2024.3.1 gast 0.4.0 gazebo_plugins 2.9.2 gazebo_ros 2.9.2 gencpp 0.7.0 geneus 3.0.0 genlisp 0.4.18 genmsg 0.6.0 gennodejs 2.0.2 genpy 0.6.15 gitdb 4.0.11 GitPython 3.1.43 google-auth 2.29.0 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 gradio 4.29.0 gradio_client 0.16.1 grpcio 1.62.2 gym 0.21.0 h11 0.14.0 h5py 3.1.0 highway-env 1.8.2 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.0 idna 3.7 image-geometry 1.16.2 imageio 2.21.1 importlib-metadata 7.0.1 importlib_resources 6.4.0 inquirer 3.3.0 interactive-markers 1.12.0 ipykernel 6.29.4 ipython 8.12.0 itsdangerous 2.2.0 jedi 0.18.2 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.2.0 joint-state-publisher 1.15.1 joint-state-publisher-gui 1.15.1 json5 0.9.25 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.0 jupyter_server_terminals 0.5.3 jupyterlab 4.2.1 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.2 keras 2.12.0 Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 laser_geometry 1.6.7 lightning 1.8.6 lightning-cloud 0.5.70 lightning-lite 1.8.6 lightning-utilities 0.11.6 lit 16.0.6 Markdown 3.4.1 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.3.4 matplotlib-inline 0.1.6 mdurl 0.1.2 message-filters 1.16.0 mistune 3.0.2 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mmcv-full 1.7.2 mmengine 0.10.4 mpmath 1.3.0 multidict 6.0.4 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 netaddr 0.8.0 networkx 3.1 notebook 7.2.0 notebook_shim 0.2.4 numpy 1.19.5 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.2 onnx 1.16.2 opencv-python 4.6.0.66 opt-einsum 3.3.0 ordered-set 4.1.0 orjson 3.10.3 oslo.config 9.0.0 oslo.i18n 5.1.0 osqp 0.6.5 overrides 7.7.0 packaging 22.0 pandas 1.3.0 pandocfilters 1.5.1 parso 0.8.3 pathlib 1.0.1 pbr 0.11.1 pickleshare 0.7.5 pillow 10.4.0 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 4.2.1 plotly 5.23.0 pooch 1.7.0 prettytable 3.5.0 prometheus_client 0.20.0 prompt-toolkit 3.0.38 protobuf 3.20.3 psutil 6.0.0 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 17.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.22 pydantic 1.10.2 pyDeprecate 0.3.2 pydub 0.25.1 pygame 2.5.2 pyglet 1.5.15 Pygments 2.15.1 PyJWT 2.8.0 pyOpenSSL 24.0.0 pyparsing 2.4.7 Pyro4 4.82 PySocks 1.7.1 python-dateutil 2.8.2 python-dotenv 1.0.1 python-json-logger 2.0.7 python-multipart 0.0.9 python-qt-binding 0.4.4 python-version 0.0.2 pytorch-lightning 1.7.7 pytz 2021.1 PyVirtualDisplay 3.0 PyWavelets 1.4.1 PyYAML 6.0.1 pyzabbix 1.2.1 pyzmq 24.0.1 qdldl 0.1.7.post0 qpsolvers 4.3.1 qt-dotgraph 0.4.2 qt-gui 0.4.2 qt-gui-cpp 0.4.2 qt-gui-py-common 0.4.2 readchar 4.1.0 referencing 0.35.1 requests 2.32.2 requests-oauthlib 2.0.0 resource_retriever 1.12.7 rfc3339-validator 0.1.4 rfc3986 2.0.0 rfc3986-validator 0.1.1 rich 13.7.1 rosbag 1.16.0 rosboost-cfg 1.15.8 rosclean 1.15.8 roscreate 1.15.8 rosgraph 1.16.0 roslaunch 1.16.0 roslib 1.15.8 roslint 0.12.0 roslz4 1.16.0 rosmake 1.15.8 rosmaster 1.16.0 rosmsg 1.16.0 rosnode 1.16.0 rosparam 1.16.0 rospy 1.16.0 rosserial_python 0.9.2 rosservice 1.16.0 rostest 1.16.0 rostopic 1.16.0 rosunit 1.15.8 roswtf 1.16.0 rpds-py 0.18.1 rqt_action 0.4.9 rqt_bag 0.5.1 rqt_bag_plugins 0.5.1 rqt-console 0.4.12 rqt_dep 0.4.12 rqt_graph 0.4.14 rqt_gui 0.5.3 rqt_gui_py 0.5.3 rqt-image-view 0.4.17 rqt_launch 0.4.9 rqt-logger-level 0.4.12 rqt-moveit 0.5.11 rqt_msg 0.4.10 rqt_nav_view 0.5.7 rqt_plot 0.4.13 rqt_pose_view 0.5.11 rqt_publisher 0.4.10 rqt_py_common 0.5.3 rqt_py_console 0.4.10 rqt-reconfigure 0.5.5 rqt-robot-dashboard 0.5.8 rqt-robot-monitor 0.5.15 rqt_robot_steering 0.5.12 rqt-runtime-monitor 0.5.10 rqt-rviz 0.7.0 rqt_service_caller 0.4.10 rqt_shell 0.4.11 rqt_srv 0.4.9 rqt-tf-tree 0.6.4 rqt_top 0.4.10 rqt_topic 0.4.13 rqt_web 0.4.10 rsa 4.7.2 rtabmap-python 0.21.3 ruff 0.4.4 runs 1.2.2 rviz 1.14.20 s3fs 2024.3.1 s3transfer 0.10.2 scikit-image 0.19.3 scikit-learn 1.1.2 scipy 1.7.0 semantic-version 2.10.0 Send2Trash 1.8.3 sensor-msgs 1.13.1 sentry-sdk 2.12.0 serpent 1.41 setproctitle 1.3.3 setuptools 69.5.1 shapely 2.0.3 shellingham 1.5.4 six 1.15.0 smach 2.5.2 smach-ros 2.5.2 smclib 1.8.6 smmap 5.0.1 sniffio 1.3.1 soupsieve 2.5 stack-data 0.6.2 starlette 0.37.2 starsessions 1.3.0 stevedore 4.1.1 sympy 1.12 tenacity 8.5.0 tensorboard 2.12.3 tensorboard-data-server 0.7.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.6.1 tensorflow 2.4.1 tensorflow-estimator 2.12.0 termcolor 1.1.0 terminado 0.18.1 tf 1.13.2 tf-conversions 1.13.2 tf2-geometry-msgs 0.7.7 tf2-kdl 0.7.7 tf2-py 0.7.7 tf2-ros 0.7.7 threadpoolctl 3.1.0 tifffile 2023.4.12 tinycss2 1.3.0 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.1 topic-tools 1.16.0 torch 2.4.0 torchaudio 2.4.0 torchmetrics 0.11.4 torchvision 0.19.0 tornado 6.4 tqdm 4.66.4 traitlets 5.9.0 triton 3.0.0 turtlebot3_example 1.2.5 turtlebot3_teleop 1.2.5 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 ujson 5.9.0 urllib3 1.26.19 uvicorn 0.29.0 uvloop 0.19.0 wandb 0.17.5 watchfiles 0.23.0 waymo-open-dataset-tf-2-6-0 1.4.9 wcwidth 0.2.5 websocket-client 1.8.0 websockets 11.0.3 Werkzeug 3.0.3 wheel 0.43.0 wrapt 1.12.1 xacro 1.14.17 xmod 1.8.1 yapf 0.40.2 yarl 1.9.3 zipp 3.17.0 zm 1.0 zmq 0.0.0

I just copied all the packages included in my conda env. I guess the best way is to degrade the pytorch-lightning and make everything compatible with this version pytorch-lightning. Just some basic configurations like some paths are modified. But the code version is not the latest.

hadar-hai commented 4 weeks ago

@YanzeZhang97 thank you very much! What do you mean by "the code version is not the latest"? So which version?

hadar-hai commented 3 weeks ago

@YanzeZhang97 Is there a way to contact you? I’m having trouble getting a working version that produces ckpt. files. If you could send me the version you have, it would be very helpful. Also, what's your Python version? Many thanks!

YanzeZhang97 commented 3 weeks ago

@hadar-hai Sorry for the late reply. It is quit busy in the begining of the new semester. The python version is 3.8. For the code, @HarukiNishimura-TRI Would you mind to push the old version code as a new branch so that people can access to the two versions of code? Thanks!

hadar-hai commented 3 weeks ago

@YanzeZhang97 Thank you! Is it this version: e5fe65f ? image

HarukiNishimura-TRI commented 3 weeks ago

@YanzeZhang97 Thank you for trying it out. It is great to hear that downgrading the lightning version resolved the issue for you. @jmercat has made a few commits lately, trying to resolve some of the issues by making changes to our code. I wonder which commit your local changes are based off of. Is it d363fde or e5fe65f? (I am guessing it's the latter, because otherwise you would not be able to even import lightning, due to the name change of the package from pytorch_lightning to lightning.pytorch.)

YanzeZhang97 commented 3 weeks ago

Hello @hadar-hai and @HarukiNishimura-TRI, Yes, e5fe65f is the version I successfully implemented. And yes, the package is pytorch-lightning.

hadar-hai commented 3 weeks ago

Hello @YanzeZhang97, did you change "mmcv" to "mmengine.config"? If you could kindly send me the working version by mail (hadar.hai@campus.technion.ac.il) it would be greatly appreciated. I created the same conda environment as you, used python 3.8 and e5fe65f as is and there are still some problems.

jmercat commented 3 weeks ago

Hello @hadar-hai and @YanzeZhang97. I looked into how to install the correct versions of everything instead of trying to update the code to the new versions of pytorch-lightning (which was a mess, sorry about that). I will never use pytorch-lightning again For this to work I pushed a roll back of my previous attempt and a new install.sh script that should install the correct packages and hopefully run correctly. It does not work on multiple gpus but I could run a simple training on one gpu on a fresh environment with the new installation script. I hope this helps. Thanks for the interest in our work. I’m sorry I don’t have much time to maintain this code base. Your contributions are welcome.