Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

trainer deepspeed fails #161

Closed enpassanty closed 3 years ago

enpassanty commented 3 years ago
! pip install git+https://github.com/PytorchLightning/lightning-transformers.git@master --upgrade --quiet
! python train.py task=nlp/language_modeling dataset=nlp/language_modeling/wikitext trainer=deepspeed

traceback:


2021-04-25 13:17:34.274832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "train.py", line 88, in <module>
    hydra_entry()
  File "/usr/local/lib/python3.7/dist-packages/hydra/main.py", line 33, in decorated_main
    config_name=config_name,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 370, in _run_hydra
    lambda: hydra.run(
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 373, in <lambda>
    overrides=args.overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 90, in run
    run_mode=RunMode.RUN,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 524, in compose_config
    from_shell=from_shell,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 149, in load_configuration
    from_shell=from_shell,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 236, in _load_configuration_impl
    skip_missing=run_mode == RunMode.MULTIRUN,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 717, in create_defaults_list
    skip_missing=skip_missing,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 688, in _create_defaults_list
    skip_missing=skip_missing,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 343, in _create_defaults_tree
    overrides=overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 420, in _create_defaults_tree_impl
    return _expand_virtual_root(repo, root, overrides, skip_missing)
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 268, in _expand_virtual_root
    overrides=overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 532, in _create_defaults_tree_impl
    add_child(children, new_root)
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 481, in add_child
    overrides=overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 532, in _create_defaults_tree_impl
    add_child(children, new_root)
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 481, in add_child
    overrides=overrides,
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 451, in _create_defaults_tree_impl
    config_not_found_error(repo=repo, tree=root)
  File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/defaults_list.py", line 769, in config_not_found_error
    options=options,
hydra.errors.MissingConfigException: In 'trainer/deepspeed': Could not find 'trainer/plugins/zero_offload'

Available options in 'trainer/plugins':
    deepspeed
    deepspeed_offload
    deepspeed_offload_stage_3
    sharded
Config search path:
    provider=hydra, path=pkg://hydra.conf
    provider=main, path=file:///usr/local/lib/python3.7/dist-packages/conf
    provider=schema, path=structured://```
SeanNaren commented 3 years ago

Thanks for the issue! need to update the DeepSpeed trainer config, but this is the preferred approach:

! pip install git+https://github.com/PytorchLightning/lightning-transformers.git@master --upgrade --quiet
! python train.py task=nlp/language_modeling dataset=nlp/language_modeling/wikitext trainer/plugins=deepspeed