Closed Usaywook closed 3 weeks ago
Hi, thanks again for your insterest.
Apologies for not uploading the UAICRL YAML file earlier. The primary reason is that in the HighD environment, the expert demonstrations are more diverse compared to MuJoCo, reducing the need to train the GFlowNet for trajectory augmentation. Additionally, the GPU memory required for GFlowNets in this environment is quite substantial. Therefore, we recommend using distributional RL methods for policy learning.
Regarding QRDQN and SplineDQN, SplineDQN is more complex and demands more training time. On the other hand, QRDQN is relatively simpler, and our previous empirical results on this environment indicated that QRDQN performs similarly or even better than SplineDQN. Hence, we opted for QRDQN.
As for the dataset, we utilized the benchmark's dataset.
For your experiments, I suggest trying both methods of QRDQN and SplineDQN, and fine-tuning them for comparison. You could also consider adding the GFlowNet module to the HighD environment if you have sufficient GPU resources.
Hi, I was impressed by your great work, so we would like to compare your method with our progressing work. However, I found that the configuration files for reproducing the UAICRL method in highD environment were missing in this repository.
Although there were not
train_UAICRL_highD_*.yaml
files, we found that there weretrain_DICRL_QRDQN_*.yaml
files. According to the contents of the configuration files, I guess that for UAICRL method for highD environment utilizes QRDQN instead of SplineDQN as distributional method.Therefore, I could reproduce the result like Figure 4 in the paper if I utilize
DICRL_QRDQN_EXP_highD_velocity_constraint-1e-1.yaml
file, is that right?I wonder why you utilize QRDQN instead of SplineDQN for highD. The answer would provide great insight for me.
Furthermore, this repo contains only one expert file scene-DEU_LocationALower-11_1_T-1_len-206.pkl for highD. Did highD's expert data use expert data from Guiliang/ICRL-benchmarks-public repository?