[BugFix] fix reward dim problem of shared baseline

ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

https://rl4.co

MIT License

455 stars 84 forks source link

[BugFix] fix reward dim problem of shared baseline #99

Closed cbhua closed 10 months ago

cbhua commented 1 year ago

In the shared baseline function, the size of the reward now is [batch_size].

codecov[bot] commented 1 year ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
rl4co/models/rl/reinforce/baselines.py	`84.96% <50.00%> (ø)`

:loudspeaker: Thoughts on this report? Let us know!.

fedebotu commented 1 year ago

Let's have a closer look first, it might work but not for all baselines - if the baseline is multiple (i.e. symmetric and multistart) we should need the dim on which to operate the mean operation - for example here
Also @hyeok9855 you might want to have a look at this since you are working with POMO!

cbhua commented 1 year ago

I agree. We should be careful to deal with this part. This bug is observed by @Leaveson . Let's prepare a clearer bug reproduce code to have a deep check.