Open li-plus opened 12 months ago
Attention: 6 lines
in your changes are missing coverage. Please review.
Comparison is base (
91a0f43
) 43.58% compared to head (730d900
) 43.58%. Report is 1 commits behind head on main.:exclamation: Current head 730d900 differs from pull request most recent head aa1031a. Consider uploading reports for the commit aa1031a to get more accurate results
Files | Patch % | Lines |
---|---|---|
trlx/models/modeling_nemo_ppo.py | 0.00% | 3 Missing :warning: |
trlx/trainer/accelerate_ppo_trainer.py | 57.14% | 3 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The current
logprobs_of_labels
computes logprobs using alog_softmax
followed by agather
. When the input logits is not contiguous, thelog_softmax
will make a copy of the logits, which is very large (batch_size seq_len vocab_size can be 32 2048 64000 * 2B = 8GB for typical settings).This PR directly feeds the contiguous logits into
log_softmax
so as to reduce the peak cuda memory and remove redundant copy.Test script:
Tested on a Tesla V100, method in this PR is both faster (1.6x speedup) and memory-efficient.