Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.71k stars 142 forks source link

[Bug]: switch to conda was not ready to merge to master #486

Closed yggdrasil75 closed 2 weeks ago

yggdrasil75 commented 2 weeks ago

What happened?

did not start. install.sh, start-ui.sh, both break

What did you expect would happen?

start program

Relevant log output

bash start-ui.sh
start-ui.sh: line 5: start-ui.sh/lib.include.sh: Not a directory

Output of pip freeze

freeze```absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.3 aiohttp==3.10.8 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 astunparse==1.6.3 async-timeout==4.0.3 attrs==24.2.0 bitsandbytes==0.43.3 cachetools==5.3.3 certifi==2024.8.30 charset-normalizer==3.3.2 cloudpickle==3.0.0 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@2ee3215949d8f2d3141c2340d8e4d24ec94b2384#egg=diffusers filelock==3.16.1 flatbuffers==24.3.25 fonttools==4.54.1 frozenlist==1.4.1 fsspec==2024.9.0 ftfy==6.2.3 gast==0.5.4 google-auth==2.29.0 google-auth-oauthlib==1.2.0 google-pasta==0.2.0 grpcio==1.66.2 h5py==3.11.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.10 importlib_metadata==8.5.0 invisible-watermark==0.2.0 Jinja2==3.1.4 keras==3.3.3 kiwisolver==1.4.7 libclang==18.1.1 lightning-utilities==0.11.7 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@85bf18746488a898818c36eca651d24734f87431#egg=mgds ml-dtypes==0.3.2 mpmath==1.3.0 multidict==6.1.0 namex==0.0.8 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu11==11.11.3.6 nvidia-cublas-cu12==12.3.4.1 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-cupti-cu12==12.3.101 nvidia-cuda-nvcc-cu12==12.3.107 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-nvrtc-cu12==12.3.107 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cuda-runtime-cu12==12.3.101 nvidia-cudnn-cu11==8.7.0.84 nvidia-cudnn-cu12==8.9.7.29 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.12.1 nvidia-curand-cu11==10.3.0.86 nvidia-curand-cu12==10.3.4.107 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusolver-cu12==11.5.4.101 nvidia-cusparse-cu11==11.7.5.86 nvidia-cusparse-cu12==12.2.0.103 nvidia-nccl-cu11==2.20.5 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu11==11.8.86 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.2 omegaconf==2.3.0 onnxruntime==1.18.0 onnxruntime-gpu==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 opt-einsum==3.3.0 optree==0.11.0 packaging==24.1 pillow==10.3.0 platformdirs==4.3.6 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.5 psutil==6.0.0 pyasn1==0.6.0 pyasn1_modules==0.4.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.9.11 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.8.1 rsa==4.9 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 scipy==1.13.1 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.3 tensorboard==2.17.0 tensorboard-data-server==0.7.2 tensorflow==2.16.1 tensorflow-io-gcs-filesystem==0.37.0 tensorrt==10.0.1 tensorrt-cu12==10.0.1 tensorrt-cu12-bindings==10.0.1 tensorrt-cu12-libs==10.0.1 tensorrt-dispatch-cu12-bindings==10.0.1 tensorrt-dispatch-cu12-libs==10.0.1 tensorrt-lean-cu12-bindings==10.0.1 tensorrt-lean-cu12-libs==10.0.1 tensorrt_dispatch==10.0.1 tensorrt_dispatch-cu12==10.0.1 tensorrt_lean==10.0.1 tensorrt_lean-cu12==10.0.1 termcolor==2.4.0 timm==1.0.9 tokenizers==0.19.1 torch==2.3.1+cu118 torchaudio==2.3.1+cu121 torchmetrics==1.4.2 torchvision==0.18.1+cu118 tqdm==4.66.4 transformers==4.42.3 triton==2.3.1 typing_extensions==4.12.2 urllib3==2.2.3 wcwidth==0.2.13 Werkzeug==3.0.4 wrapt==1.16.0 xformers==0.0.27+cu118 yarl==1.13.1 zipp==3.20.2```
Arcitec commented 2 weeks ago

User error.

Run it as instructed: ./start-ui.sh

The difference is that when you manually run bash you're messing up the value of ${BASH_SOURCE[0]}; the path to the currently executing script becomes whatever you provided on the bash <script name> command line rather than being an actual usable path.

For example, you could use bash ./start-ui.sh to avoid that problem, by providing a usable path to bash. But you should not be using bash at all. Because it bypasses the intended shell that the script was written in, and also causes bash to behave a bit differently, as you just experienced. We will not support all the quirks of such a weird way to launch the scripts. Even adding a warning message about it would require way too much code clutter/hassle for something that nobody should be doing. :)

In general, you should never run anyone's Unix scripts by saying bash <script name> and I'm curious where you got that idea from? Always run them by directly executing the script: ./start-ui.sh. This then uses the script's shebang line (the #! at the top) to run the script with the correct shell/interpreter and environment. For instance, sometimes shell scripts are literally written in Python and use #!/usr/bin/env python as its shebang line. That would completely break if you try to force bash to run that script. Never do that. :)

The correct method of running scripts even works when adding OneTrainer to your PATH and running start-ui.sh directly as a global command (without any leading ./). Because as long as the script is executed correctly (by itself; NOT as an argument to bash or any other shell), then the ${BASH_SOURCE[0]} value's path will be valid and usable by us.

There's detailed documentation here if you want to learn more about the new launcher:

https://github.com/Nerogar/OneTrainer/blob/master/LAUNCH-SCRIPTS.md

Arcitec commented 2 weeks ago

By the way, if you're using Conda (if conda is a valid command on your system), beware that the GUI fonts are broken until #481 is merged. That pull request fixes a 10 month old issue. And after merge, you'll need to delete the old conda_env directory to let it reinstall with the font rendering support.

If you want to have the fix immediately, there's a command in a comment in that pull request, showing you how to download that fix before it's been merged. Edit: It's merged. You can use the official OneTrainer.

Let me know if you have any other questions. :)