[BugFix] solve memory leakage during autoregressive decoding

LTluttmann commented 3 months ago

Description

By not keeping only the logprobs of selected actions but all logprobs in the buffer of the decoding strategies, memory issues might occur. This is now fixed by adding an additional parameter only_store_selected_logp to the base decoding strategy, which defaults to True. Only exception is when we require the computation of the entropy (as in PPO). This fix should be backward compatible. We can still pass all logprobs to the get_log_likelihood function along with the actions, in which case this functions gathers the correct logprobs (like before).

Motivation and Context

see above

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds core functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)
[ ] Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!

[ ] My change requires a change to the documentation.
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.

fedebotu commented 3 months ago

Results on a notebook, single 3090, training POMO with 100k samples/epoch, bs 64, on TSP 100 Old / New

Time: 3:02 / 2:58
GPU load: 61-68% / 59-65%
VRAM: 5.5GB / 3.4GB

The training time and load improve slightly (some percentage points) and VRAM usage is decreased by around 40%! :rocket: Will merge

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 92.85714% with 1 lines in your changes are missing coverage. Please review.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Files	Coverage Δ
rl4co/__init__.py	`100.00% <100.00%> (ø)`
rl4co/models/common/constructive/base.py	`85.13% <ø> (ø)`
rl4co/models/rl/ppo/ppo.py	`84.00% <ø> (ø)`
rl4co/models/zoo/ptrnet/decoder.py	`96.25% <100.00%> (+0.09%)`	:arrow_up:
rl4co/utils/decoding.py	`91.12% <90.90%> (-0.16%)`	:arrow_down:

... and 3 files with indirect coverage changes

cbhua commented 3 months ago

Thanks @LTluttmann! Awesome work. I didn't give a review in time but based on the test results as shown by @fedebotu, I believe this is a really good updating! 🙌

ai4co / rl4co