Performance Comparison Results

Hi Namjiwon1023,

Thank you for taking interest in my previous experiment.

Based on the experiments, I previously concluded that Channel-wise Self Attention Network(C-SAN, the rvuattn code) is better on average (15%) for training a more efficient model. It could be due to the complexity of the tasks, number of elements in the environment or dynamic variables. There is also a caveat that a non-significant amount of no-attention models actually performed better in certain scenarios (Roughly 20%).

The scope of this experiment is also extremely narrow, especially since it is only in the context of Atari2600 simple games. It may not translate well into other fields. I have never tried it on my part because of limited time and resources and have since dropped efforts in understanding what makes C-SAN variant superior in my experiments.

I would like to think that overall, C-SAN (the rvuattn code) could provide a better state representation which makes it easier for the underlying RL model to learn. If you were to give it a try, do let me know if it works or doesn't.

Thanks!

DaDucking / PPOAttention

Performance Comparison Results #1