Make Activation Checkpointing Configurable

mali-git commented 3 months ago

This PR addresses the issue described in Issue #32 by removing the hard-coded dependency for applying activation checkpointing specifically to GPT2Block. Instead, activation checkpointing is now made configurable through the configuration file, allowing greater flexibility and adaptability for different models.

General changes:

Enhancement: Allow the definition of modules to be checkpointed within the configuration file

Breaking changes:

Enhacement: make modules to be checkpointed configurable. This requires replacing the boolean entry do_apply_activation_checkpointing in the configuration file with the list activation_checkpointing_modules (see 4a2a049e893acdd49c2dd7801b74a20af0db1e38), which specifies all the submodules to which activation checkpointing should be applied.

Checklist before submitting final PR

[x] My PR is minimal and addresses one issue / enhancement in isolation
[x] I have merged main into this feature branch
[x] I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
[x] I have run a sample config for model training
[x] I have fixed all failing tests (python tests/tests.py)

le1nux commented 3 months ago

LGTM :-) Nice work!

mali-git commented 3 months ago

we can still remove accelerate from the pytorch.toml as a dependency but besides that it looks great!

Done in 8605dbecbe46c068668f283bf4a6352c85ca1c21.

Modalities / modalities

Make Activation Checkpointing Configurable #159

Checklist before submitting final PR