Closed kuriot closed 1 year ago
I forgot to write that GPU is Nvidia RTX 3070.
Hi, i also had issues to get this running in a vast.ai machine (it’s also Linux) Spend a few hours to get it running and wrote a script for that. Maybe it helps you.
You can find it here
Please report of it work to you. My problem is if I create Lora with the Linux machines I get desaturated images when use my trained Lora’s. Could you please report if you had similar problems?
Thanks 😊
Hi, i also had issues to get this running in a vast.ai machine (it’s also Linux) Spend a few hours to get it running and wrote a script for that. Maybe it helps you.
You can find it here
Please report of it work to you. My problem is if I create Lora with the Linux machines I get desaturated images when use my trained Lora’s. Could you please report if you had similar problems?
Thanks blush
Your script helped me, thanks. But I didn't use it entirely. Setting MKL_THREADING_LAYER helped me with some strange problem about Intel CPU (I use AMD).
I also decided to check what setup.sh
does and do everything my way and with conda.
That is my installation process:
rm -rf ./venv
conda create -n kohya python=3.10.9
conda activate kohya
conda install pytorch==1.13.1 torchvision==0.14.1 xformers -c pytorch -c nvidia -c xformers
pip install triton
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
And launch script:
#!/usr/bin/env bash
source ~/.miniconda3/etc/profile.d/conda.sh
conda activate kohya
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH
export MKL_THREADING_LAYER=1
SCRIPT_DIR=$(cd -- "$(dirname -- "$0")" && pwd)
cd "$SCRIPT_DIR"
python "$SCRIPT_DIR/kohya_gui.py" "$@"
Now I don't get any errors and training is working. On my Nvidia RTX 3070 115W laptop video card I get 2.7 tokens per second with the maximum resolution of dataset images 768x768. I don't know if it is OK, or no.
I use Torch v1.13.1, because with 2.0.0 and 2.0.1 I had problems in Stable Diffusion, so I stick to 1.13.1 on Linux for now.
Hm will try this out, maybe it helps me with my desaturated images.
So you don’t install it with the setup.sh script? Only with the steps above?
Hm will try this out, maybe it helps me with my desaturated images.
So you don’t install it with the setup.sh script? Only with the steps above?
I removed completely kohya_ss to check if I need to run setup.sh
and no, I didn't need to.
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
conda create -n kohya python=3.10.9
conda activate kohya
conda install pytorch==1.13.1 torchvision==0.14.1 xformers -c pytorch -c nvidia -c xformers
pip install triton
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
Then I created start.sh
file:
#!/usr/bin/env bash
source ~/.miniconda3/etc/profile.d/conda.sh
conda activate kohya
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH
export MKL_THREADING_LAYER=1
SCRIPT_DIR=$(cd -- "$(dirname -- "$0")" && pwd)
cd "$SCRIPT_DIR"
python "$SCRIPT_DIR/kohya_gui.py" "$@"
chmod +x start.sh
./start.sh
Everything works.
Thanks, will try it out tomorrow 👍
I've updated install commands to fix some tensor errors in logs. Also, captioning didn't work until I fixed it.
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
conda create -n kohya python=3.10.9
conda activate kohya
conda install pytorch==1.13.1 torchvision==0.14.1 xformers -c pytorch -c nvidia -c xformers
conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install 'nvidia-cudnn-cu11>=8.6<9' tensorflow==2.11.* tensorrt==8.6.1 triton
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.7
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.7
And start.sh
:
#!/usr/bin/env bash
source ~/.miniconda3/etc/profile.d/conda.sh
conda activate kohya
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH
export MKL_THREADING_LAYER=1
SCRIPT_DIR=$(cd -- "$(dirname -- "$0")" && pwd)
cd "$SCRIPT_DIR"
python "$SCRIPT_DIR/kohya_gui.py" "$@"
Thanks, build a Script in my repo. Will see if it works to me :)
For me it returned this error:
start.sh Traceback (most recent call last): File "/AI/kohya/kohya_gui.py", line 4, in <module> from dreambooth_gui import dreambooth_tab File "/AI/kohya/dreambooth_gui.py", line 13, in <module> from library.common_gui import ( File "/AI/kohya/library/common_gui.py", line 2, in <module> from easygui import msgbox ModuleNotFoundError: No module named 'easygui'
I installed the easygui with:
pip install easygui
And after that it started the GUI. But when I tried to train i got this error:
_ctypes.cpython-39-x86_64-linux-gnu.so: undefined symbol: ffi_closure_alloc, version LIBFFI_CLOSURE_7.0
On which System do you running this?
Ah, sorry, it's my mistake. I think I've installed requirements.txt in conda environment at some point and forgot about it. I just created a clean environment and found out that I need to install some requirements after commands in previous messages. So, I removed everything already installed in environment from requirements.txt, then did pip install -r requirements.txt
and it worked.
Here's my requirements.txt from which I removed what was already installed manually:
accelerate==0.18.0
albumentations==1.3.0
altair==4.2.2
dadaptation==1.5
diffusers[torch]==0.10.2
easygui==0.98.3
einops==0.6.0
ftfy==6.1.1
gradio==3.28.1
lion-pytorch==0.0.6
opencv-python==4.7.0.68
pytorch-lightning==1.9.0
safetensors==0.2.6
toml==0.10.2
voluptuous==0.13.1
wandb==0.15.0
fairscale==0.4.13
requests==2.28.2
timm==0.6.12
huggingface-hub==0.13.3
lycoris_lora==0.1.4
After installing all requirements i got this error:
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 1.13.1 with CUDA 1106 (you have 1.13.1) Python 3.10.11 (you have 3.10.11) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details 2023-05-16 14:43:05.161259: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-16 14:43:05.282488: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0. 2023-05-16 14:43:06.021567: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory;
Yeah Yeah Yeah its working without desaturated images, Big Thanks @kuriot
Installed it successfully. Wrote a Bash Script for all the steps but the script dont work, you need copy&paste the commands. Then it work. Maybe it help someone to run this under Linux. Try to figure out whats the problem in the script but when you execute the steps manually it works. Today i´m to drunk xD
You can find it here
Before answering your previous message I decided to do a clean install and it didn't work. :) For the last couple hours I tried to install it. Literally these commands in this sequence work for me. Anyway, for someone who will stumble upon this issue it may be a starting point.
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
conda create -y -n kohya python=3.10.9
conda activate kohya
conda install -y pytorch==1.13.1 torchvision==0.14.1 xformers -c pytorch -c nvidia -c xformers
conda install -y -c conda-forge cudatoolkit=11.8.0
python3 -m pip install 'nvidia-cudnn-cu11>=8.6<9' triton
python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt-libs
python3 -m pip install -r requirements.txt
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.7
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.7
python3 -m pip cache purge
conda clean -afy
conda deactivate
I used python3 -m pip install -r requirements_unix.txt instead and used python3 kohya_gui.py --share to run and it worked! thanks.
Before answering your previous message I decided to do a clean install and it didn't work. :) For the last couple hours I tried to install it. Literally these commands in this sequence work for me. Anyway, for someone who will stumble upon this issue it may be a starting point.
git clone https://github.com/bmaltais/kohya_ss.git cd kohya_ss conda create -y -n kohya python=3.10.9 conda activate kohya conda install -y pytorch==1.13.1 torchvision==0.14.1 xformers -c pytorch -c nvidia -c xformers conda install -y -c conda-forge cudatoolkit=11.8.0 python3 -m pip install 'nvidia-cudnn-cu11>=8.6<9' triton python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt-libs python3 -m pip install -r requirements.txt mkdir -p $CONDA_PREFIX/etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.7 ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.7 python3 -m pip cache purge conda clean -afy conda deactivate
Hi kuriot, I wonder if you still have this problem with the newest release. I have the same problem now. I wonder if you could have a disscussion for this . Regards,
@future141 I stick to an older version. Also, there's a problem with gradio which is fixed by manual install and in branch I use there are problems with quotes in requirements_linux.txt
file.
So, here are steps I checked on a clean install and it works fine:
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
git checkout v21.7.16
conda create -y -n kohya python=3.10.9
conda activate kohya
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python3 -m pip install 'xformers==0.0.20' 'bitsandbytes==0.35.0' 'accelerate==0.15.0' 'tensorboard==2.12.1' 'tensorflow==2.12.0' -r requirements.txt
conda install -y -c conda-forge cudatoolkit=11.8.0
python3 -m pip install 'nvidia-cudnn-cu11>=8.6<9'
python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt-libs
python3 -m pip install 'gradio==3.36.1'
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.7
ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.7
python3 -m pip cache purge
conda clean -afy
conda deactivate
And run.sh
file without changes. Just fix in the script the path to your ~/.miniconda3/etc/profile.d/conda.sh
on line 3. I'll add it here again for you not to scroll up:
#!/usr/bin/env bash
source ~/.miniconda3/etc/profile.d/conda.sh
conda activate kohya
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH
export MKL_THREADING_LAYER=1
SCRIPT_DIR=$(cd -- "$(dirname -- "$0")" && pwd)
cd "$SCRIPT_DIR"
python "$SCRIPT_DIR/kohya_gui.py" "$@"
@future141 I stick to an older version. Also, there's a problem with gradio which is fixed by manual install and in branch I use there are problems with quotes in
requirements_linux.txt
file.So, here are steps I checked on a clean install and it works fine:
git clone https://github.com/bmaltais/kohya_ss.git cd kohya_ss git checkout v21.7.16 conda create -y -n kohya python=3.10.9 conda activate kohya python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 python3 -m pip install 'xformers==0.0.20' 'bitsandbytes==0.35.0' 'accelerate==0.15.0' 'tensorboard==2.12.1' 'tensorflow==2.12.0' -r requirements.txt conda install -y -c conda-forge cudatoolkit=11.8.0 python3 -m pip install 'nvidia-cudnn-cu11>=8.6<9' python3 -m pip install --extra-index-url https://pypi.nvidia.com tensorrt-libs python3 -m pip install 'gradio==3.36.1' mkdir -p $CONDA_PREFIX/etc/conda/activate.d echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.7 ln -sr $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.8 $CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs/libnvinfer_plugin.so.7 python3 -m pip cache purge conda clean -afy conda deactivate
And
run.sh
file without changes. Just fix in the script the path to your~/.miniconda3/etc/profile.d/conda.sh
on line 3. I'll add it here again for you not to scroll up:#!/usr/bin/env bash source ~/.miniconda3/etc/profile.d/conda.sh conda activate kohya export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.10/site-packages/tensorrt_libs:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH export MKL_THREADING_LAYER=1 SCRIPT_DIR=$(cd -- "$(dirname -- "$0")" && pwd) cd "$SCRIPT_DIR" python "$SCRIPT_DIR/kohya_gui.py" "$@"
My friend, I found the problem in my case (a freshly installed ubuntu). The case is shown in https://github.com/bmaltais/kohya_ss/issues/1109. You could possibly try this method, wonder if it can help.
@future141 Thanks, I'll take a look. With SDXL 1.0 out I want to try to train it, so it's time to update Kohya. :)
Hello.
Steps to install:
Output when I do
accelerate config
:When I launch training: