This PR addresses the issue described in Issue #32 by removing the hard-coded dependency for applying activation checkpointing specifically to GPT2Block. Instead, activation checkpointing is now made configurable through the configuration file, allowing greater flexibility and adaptability for different models.
General changes:
Enhancement: Allow the definition of modules to be checkpointed within the configuration file
Breaking changes:
Enhacement: make modules to be checkpointed configurable.
This requires replacing the boolean entry do_apply_activation_checkpointing in the configuration file with the list activation_checkpointing_modules (see 4a2a049e893acdd49c2dd7801b74a20af0db1e38), which specifies all the submodules to which activation checkpointing should be applied.
Checklist before submitting final PR
[x] My PR is minimal and addresses one issue / enhancement in isolation
[x] I have merged main into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have fixed all failing tests (python tests/tests.py)
This PR addresses the issue described in Issue #32 by removing the hard-coded dependency for applying activation checkpointing specifically to
GPT2Block
. Instead, activation checkpointing is now made configurable through the configuration file, allowing greater flexibility and adaptability for different models.General changes:
Breaking changes:
do_apply_activation_checkpointing
in the configuration file with the listactivation_checkpointing_modules
(see 4a2a049e893acdd49c2dd7801b74a20af0db1e38), which specifies all the submodules to which activation checkpointing should be applied.Checklist before submitting final PR
python tests/tests.py
)