Refactor the DeepSpeed strategy config management

DeepSpeed works by using a configuration file (dictionary) that allows customizing all of its aspects: https://www.deepspeed.ai/docs/config-json/

The DeepSpeedStrategy supports two ways of defining this:

Option 2 is not scalable because:

It forces us to duplicate all arguments
Our docstrings might become outdated
Our strategy defaults might diverge from the defaults in deepspeed
It forces the user to either create an entire config or use these arguments
Arguments might be different based on the installed deepspeed version as we support more than a single version.
When deepspeed adds an argument that we don't expose, users have to switch to using the config

Remove all these exposed arguments and just have a config argument that overloads support for:

Passing a path to a config file

DeepSpeedStrategy(config="my/config/path.json")

Passing a full config object

config = ds.runtime.config.DeepSpeedConfig({"train_micro_batch_size_per_gpu": 2})
DeepSpeedStrategy(config=config)

Passing a config dictionary (or a subset of it) that will update the default config

config = {"zero_optimization": {"offload_optimizer": {"device": "cpu"}}}
DeepSpeedStrategy(config=config)

cc @justusschock @awaelchli @carmocca

Lightning-AI / pytorch-lightning