Jasonxu1225 / Uncertainty-aware-Inverse-Constrained-Reinforcement-Learning

[ICLR 2024] ''Uncertainty-aware Constraint Inference in Inverse Constrained Reinforcement Learning'' Official Code
MIT License
8 stars 1 forks source link

Request for configuration detail to implement UAICRL in highD environment. #2

Closed Usaywook closed 3 weeks ago

Usaywook commented 3 weeks ago

Hi, I was impressed by your great work, so we would like to compare your method with our progressing work. However, I found that the configuration files for reproducing the UAICRL method in highD environment were missing in this repository.

├── HighD_velocity_constraint
│   ├── highD_environment_configurations_no_velocity_penalty-40.yaml
│   ├── highD_environment_configurations_velocity_penalty-40.yaml
│   ├── train_Binary_highD_velocity_constraint-1e-1.yaml
│   ├── train_Binary_highD_velocity_constraint.yaml
│   ├── train_DICRL_QRDQN_CVaR_highD_velocity_constraint-1e-1.yaml
│   ├── train_DICRL_QRDQN_CVaR_highD_velocity_constraint.yaml
│   ├── train_DICRL_QRDQN_EXP_highD_velocity_constraint-1e-1.yaml
│   ├── train_DICRL_QRDQN_EXP_highD_velocity_constraint.yaml
│   ├── train_GAIL_highd_velocity_constraint-1e-1.yaml
│   ├── train_GAIL_highd_velocity_constraint.yaml
│   ├── train_ICRL_highD_velocity_constraint-1e-1.yaml
│   ├── train_ICRL_highD_velocity_constraint.yaml
│   ├── train_VICRL_highD_velocity_constraint-1e-1.yaml
│   ├── train_VICRL_highD_velocity_constraint.yaml
│   ├── train_ppo_highD_velocity_constraint.yaml
│   └── train_ppo_lag_highD_velocity_constraint.yaml

Although there were not train_UAICRL_highD_*.yaml files, we found that there were train_DICRL_QRDQN_*.yaml files. According to the contents of the configuration files, I guess that for UAICRL method for highD environment utilizes QRDQN instead of SplineDQN as distributional method.

Therefore, I could reproduce the result like Figure 4 in the paper if I utilize DICRL_QRDQN_EXP_highD_velocity_constraint-1e-1.yaml file, is that right?

I wonder why you utilize QRDQN instead of SplineDQN for highD. The answer would provide great insight for me.

Furthermore, this repo contains only one expert file scene-DEU_LocationALower-11_1_T-1_len-206.pkl for highD. Did highD's expert data use expert data from Guiliang/ICRL-benchmarks-public repository?

Jasonxu1225 commented 3 weeks ago

Hi, thanks again for your insterest.

Apologies for not uploading the UAICRL YAML file earlier. The primary reason is that in the HighD environment, the expert demonstrations are more diverse compared to MuJoCo, reducing the need to train the GFlowNet for trajectory augmentation. Additionally, the GPU memory required for GFlowNets in this environment is quite substantial. Therefore, we recommend using distributional RL methods for policy learning.

Regarding QRDQN and SplineDQN, SplineDQN is more complex and demands more training time. On the other hand, QRDQN is relatively simpler, and our previous empirical results on this environment indicated that QRDQN performs similarly or even better than SplineDQN. Hence, we opted for QRDQN.

As for the dataset, we utilized the benchmark's dataset.

For your experiments, I suggest trying both methods of QRDQN and SplineDQN, and fine-tuning them for comparison. You could also consider adding the GFlowNet module to the HighD environment if you have sufficient GPU resources.