ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
455 stars 84 forks source link

[BugFix] fix reward dim problem of shared baseline #99

Closed cbhua closed 10 months ago

cbhua commented 1 year ago

In the shared baseline function, the size of the reward now is [batch_size].

codecov[bot] commented 1 year ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Files Coverage Δ
rl4co/models/rl/reinforce/baselines.py 84.96% <50.00%> (ø)

:loudspeaker: Thoughts on this report? Let us know!.

fedebotu commented 1 year ago

Let's have a closer look first, it might work but not for all baselines - if the baseline is multiple (i.e. symmetric and multistart) we should need the dim on which to operate the mean operation - for example here
Also @hyeok9855 you might want to have a look at this since you are working with POMO!

cbhua commented 1 year ago

I agree. We should be careful to deal with this part. This bug is observed by @Leaveson . Let's prepare a clearer bug reproduce code to have a deep check.