Issues with bulding OpenRLHF locally

kfertakis commented 3 months ago

Hi,

building the project in a python venv using setup.py has the following issues.

wheel and packaging are missing from requirements.txt required by flash-attn
Running train_ppo_llama.sh leads to flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4c... error. The only way I could overcome this is by manually rebuilding flash-attn with FLASH_ATTENTION_FORCE_BUILD=TRUE pip install --force-reinstall flash-attn which takes a long time. Is there a better way to setup the project?

Thanks

Environment: Ubuntu 20.04 Python 3.10 CUDA 12.1

hijkzzz commented 3 months ago

We usually use docker to solve this problem, refer to our dockerfiles https://github.com/OpenLLMAI/OpenRLHF/tree/main/dockerfile

catqaq commented 3 months ago

Hi, pip install packaging ninja then, you can download specified version from https://github.com/Dao-AILab/flash-attention/releases.

mickel-liu commented 3 months ago

OpenLLMAI / OpenRLHF