issues
search
OpenThaiGPT
/
openthaigpt-pretraining
Apache License 2.0
21
stars
10
forks
source link
feat(model): add gradient checkpointing falcon
#267
Closed
MoosaTae
closed
1 year ago
MoosaTae
commented
1 year ago
Why this PR
falcon need gradient checkpointing
Changes
add gradient checkpointing falcon
add DecoderLayerWithCheckpointing with have checkpoint only attention block and inherit back to RWForCausalLMwithCheckpointing (similar to gptj)
test with bf16 and deepspeed stage2
solve wrong config on gptj gradient checkpointing only head attention
Related Issues
Close #
Checklist
[x] PR should be in the
Naming convention
[x] Assign yourself in to Assigneees
[x] Tag related issues
[x] Constants name should be ALL_CAPITAL, function name should be snake_case, and class name should be CamelCase
[x] complex function/algorithm should have
Docstring
[ ] 1 PR should not have more than 200 lines changes (Exception for test files). If more than that please open multiple PRs
[x] At least PR reviewer must come from the task's team (model, eval, data)
Why this PR
falcon need gradient checkpointing
Changes
Related Issues
Close #
Checklist