Closed thiner closed 1 year ago
It seems that the main error is related to the installation of apex. mPLUG-Owl depends on cpp extension (MixedFusedLayerNorm) of apex, therefore compiling apex from source code is needed.
Do you install apex from source on A100s? Seems like the installation is not as straightforward for them.
We recommand to install apex from the source. Apex.
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Our repository now includes an Apex copy and installation guidelines, which have been validated for installation in V100 and A100 with PyTorch 1.13.1+cu117. Please refer to the repository for more information.
Thanks for your updates. But maybe you forgot to push the apex_22.01_pp
code?
We have updated it. By the way, we are planning to remove the reliance of apex in the next version. Stay Tuned.
Installation failed again.
I updated the env.yaml file, update the pytorch version to 1.13.1 as you mentioned above. But it ran into error prompting plenty of lib version imcompatible issues. Can you generat a workable environment.yml
file by conda env export --from-history | findstr -v "prefix" > environment.yml
?
By the way, I think it's better to add env name in the env.yaml
file. E.g. name: owl
Does the Apex installation of your project support 3090 or TITAN RTX?because there are some problems encountered in the installation
I tried to prepare environment with
conda env create -f env.yaml
, but failed. The error message as below: