Closed LTluttmann closed 3 months ago
Results on a notebook, single 3090, training POMO with 100k samples/epoch, bs 64, on TSP 100 Old / New
The training time and load improve slightly (some percentage points) and VRAM usage is decreased by around 40%! :rocket: Will merge
Attention: Patch coverage is 92.85714%
with 1 lines
in your changes are missing coverage. Please review.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Files | Coverage Δ | |
---|---|---|
rl4co/__init__.py | 100.00% <100.00%> (ø) |
|
rl4co/models/common/constructive/base.py | 85.13% <ø> (ø) |
|
rl4co/models/rl/ppo/ppo.py | 84.00% <ø> (ø) |
|
rl4co/models/zoo/ptrnet/decoder.py | 96.25% <100.00%> (+0.09%) |
:arrow_up: |
rl4co/utils/decoding.py | 91.12% <90.90%> (-0.16%) |
:arrow_down: |
Thanks @LTluttmann! Awesome work. I didn't give a review in time but based on the test results as shown by @fedebotu, I believe this is a really good updating! 🙌
Description
By not keeping only the logprobs of selected actions but all logprobs in the buffer of the decoding strategies, memory issues might occur. This is now fixed by adding an additional parameter
only_store_selected_logp
to the base decoding strategy, which defaults to True. Only exception is when we require the computation of the entropy (as in PPO). This fix should be backward compatible. We can still pass all logprobs to theget_log_likelihood
function along with the actions, in which case this functions gathers the correct logprobs (like before).Motivation and Context
see above
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!