Hi.
I tried to use this open source project to compare the performances of different algorithms. According to the definition of Return in the codes, it should keep increasing? However, I ran the codes(using DQN, REINFORCE and PPO) for multiple times and plotted drone0/Return in tensorboard log, the average return seemed to converge.
I changed the network structure from C3F2 to AlexNetDuel for DQN algorithm. The return figure seemed to match the paper you provide. But REINFORCE and PPO is still not be able to navigate the drone for a long safe distance. Or I missed something...
Could you please give me some guidance on this issue?
Thanks for your work. It helps a lot.
Hi. I tried to use this open source project to compare the performances of different algorithms. According to the definition of Return in the codes, it should keep increasing? However, I ran the codes(using DQN, REINFORCE and PPO) for multiple times and plotted drone0/Return in tensorboard log, the average return seemed to converge. I changed the network structure from C3F2 to AlexNetDuel for DQN algorithm. The return figure seemed to match the paper you provide. But REINFORCE and PPO is still not be able to navigate the drone for a long safe distance. Or I missed something... Could you please give me some guidance on this issue? Thanks for your work. It helps a lot.