issue when calculating the `log_probs` in EUPG.

LucasAlegre / morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.

https://lucasalegre.github.io/morl-baselines

MIT License

271 stars 44 forks source link

issue when calculating the `log_probs` in EUPG. #88

Closed omidsbhn closed 4 months ago

omidsbhn commented 7 months ago

I was studding the implementation of EUPG algorithm and I noticed that there is a problem when calculating the log_probs in line 179: log_probs = current_distribution.log_prob(actions) using the fishwood environment. When updating the network, the size of actions is [200,1], and the size of current_distribution is [200,2]. But the size of log_probs is [200,200]. I suggest to use current_distribution.log_prob(actions.squeeze()) instead of current_distribution.log_prob(actions) in the implementation, but I'm not sure if it is correct or not.

ffelten commented 7 months ago

Hi @omidsbhn,

Thank you for the bug report :-)!

Indeed, it seems there is something off there. One (or more) bug might have slipped in, we didn't thoroughly tested this algorithm yet tbh.

I tried to use squeeze and compare to the current implementation, can't say it changes much (green is fixed, purple is current implem):

I would appreciate it we keep digging on this and discuss the implementation if you want :-)

omidsbhn commented 7 months ago

Hi @ffelten Thank you for checking the issue so quickly :-) For me the graphs are almost the same, but I think for more complex environments, it could be an issue. Btw, I would be happy to discuss the implementation and keep digging on this. Specifically, based on this implementation I'm trying to regenerate the ablation study suggested by the paper.

ffelten commented 7 months ago

You can join the discord for direct communication https://discord.com/invite/jpYmEFNT