ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
5 stars 3 forks source link

Enabled few test_configurable_parallel tests which work on ROCm #31

Closed rraminen closed 3 years ago

rraminen commented 3 years ago

Megatron-lm installation required: pip install megatron-lm==1.1.5

jithunnair-amd commented 3 years ago

Megatron-lm installation required: pip install megatron-lm==1.1.5

@rraminen How is this requirement handled today in our scripts for CI or local builds?

rraminen commented 3 years ago

Megatron-lm installation required: pip install megatron-lm==1.1.5

@rraminen How is this requirement handled today in our scripts for CI or local builds?

@jithunnair-amd I have included pip install megatron-lm==1.1.5 in CI scripts.

megatron-lm==1.1.5 is present inDeepSpeed/requirements/requirements-dev.txt. Do you think installing using pip install -r requirements-dev.txt is accurate? or would you like it to be included in requirements-rocm.txt?

jithunnair-amd commented 3 years ago

The requirements that are listed in https://github.com/microsoft/DeepSpeed/blob/master/setup.py#L52 are passed in the extras_require flag of setup: https://github.com/microsoft/DeepSpeed/blob/master/setup.py#L237 And by definition, they are optional (as opposed to install_requires): https://stackoverflow.com/questions/41268863/difference-between-extras-require-and-install-requires-in-setup-py

So please figure out how to properly install the extra requirements using pip. That should be the command we should use even in CI, instead of directly invoking pip install megatron-lm=1.1.5

jithunnair-amd commented 3 years ago

I also don't see all 3 CI builds showing up here, and the name of the build shown here is still "default". Can you please work with Omkar to make sure that all 3 CI builds show up here, and use an empty commit if you need to retrigger the CI builds from the PR.

jithunnair-amd commented 3 years ago

The proper way of installing extras seems to be: pip install dist/<whl>[extra_name]. However, that'd involve modifying the install.sh file, so instead, we're opting to use pip install -r requirements/requirements-dev.txt in the CI script.