Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.78k stars 148 forks source link

[Bug]: dependency issue - bitsandbytes - program fails to initialise #469

Closed clayajohnson closed 6 days ago

clayajohnson commented 1 month ago

What happened?

Running ./start-ui.sh fails with ModuleNotFoundError because the bitsandbytes package is not installed by install.sh

What did you expect would happen?

The program would run successfully.

NOTE: users can fix this issue by running python -m pip install bitsandbytes==0.43.3 alternatively, one of the devs can update the default requirements to include the package.

Relevant log output

(venv) user@XPS-13:OneTrainer$ ./start-ui.sh
conda not found; python version correct; use native python
Traceback (most recent call last):
  File "/home/user/Documents/fs-tools/OneTrainer/scripts/train_ui.py", line 5, in <module>
    from modules.ui.TrainUI import TrainUI
  File "/home/user/Documents/fs-tools/OneTrainer/modules/ui/TrainUI.py", line 9, in <module>
    from modules.trainer.GenericTrainer import GenericTrainer
  File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/GenericTrainer.py", line 17, in <module>
    from modules.trainer.BaseTrainer import BaseTrainer
  File "/home/user/Documents/fs-tools/OneTrainer/modules/trainer/BaseTrainer.py", line 8, in <module>
    from modules.util import create
  File "/home/user/Documents/fs-tools/OneTrainer/modules/util/create.py", line 6, in <module>
    from modules.dataLoader.FluxBaseDataLoader import FluxBaseDataLoader
  File "/home/user/Documents/fs-tools/OneTrainer/modules/dataLoader/FluxBaseDataLoader.py", line 6, in <module>
    from modules.model.FluxModel import FluxModel
  File "/home/user/Documents/fs-tools/OneTrainer/modules/model/FluxModel.py", line 8, in <module>
    from modules.module.LoRAModule import LoRAModuleWrapper
  File "/home/user/Documents/fs-tools/OneTrainer/modules/module/LoRAModule.py", line 8, in <module>
    from modules.util.quantization_util import get_unquantized_weight, get_weight_shape
  File "/home/user/Documents/fs-tools/OneTrainer/modules/util/quantization_util.py", line 8, in <module>
    import bitsandbytes as bnb
ModuleNotFoundError: No module named 'bitsandbytes'
(venv) user@XPS-13:OneTrainer$

Output of pip freeze

absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 cloudpickle==3.0.0 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@2ee3215949d8f2d3141c2340d8e4d24ec94b2384#egg=diffusers filelock==3.16.0 flatbuffers==24.3.25 fonttools==4.53.1 frozenlist==1.4.1 fsspec==2024.9.0 ftfy==6.2.3 grpcio==1.66.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.8 importlib_metadata==8.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.7 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@85bf18746488a898818c36eca651d24734f87431#egg=mgds mpmath==1.3.0 multidict==6.1.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.6.68 nvidia-nvtx-cu12==12.1.105 omegaconf==2.3.0 onnxruntime==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.2 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.4 psutil==6.0.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.4 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.7.24 requests==2.32.3 rich==13.8.1 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 scipy==1.13.1 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.2 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.9 tokenizers==0.19.1 torch==2.3.1 torchmetrics==1.4.1 torchvision==0.18.1 tqdm==4.66.4 transformers==4.42.3 triton==2.3.1 typing_extensions==4.12.2 urllib3==2.2.2 wcwidth==0.2.13 Werkzeug==3.0.4 yarl==1.11.1 zipp==3.20.1

Nerogar commented 1 month ago

What GPU do you have? This might be an issue with the and requirements because bitsandbytes only really works with Nvidia GPUs

clayajohnson commented 1 month ago

No GPU, and running on Ubuntu. It seems that program always tries to import bitsandbytes when initialising, which is an issue if bitsandbytes is only installed by install.sh when it detects a Nvidia GPU.

cchance27 commented 4 weeks ago

This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon

Arcitec commented 4 weeks ago

This means it wont start on mac either because macs because bitsandbytes isnt supported on apple silicon

I know that bitsandbytes is working on Apple M-chip support though. Not sure if it's out yet.

O-J1 commented 6 days ago

Since this is not a bug but expected behaviour I will be closing. Please open a feature request once Bitsandbytes adds Apple M-chip support 🫡

clayajohnson commented 6 days ago

@O-J1 can you clarify how this is expected behaviour?

Apologies if this wasn't clear in the original issue I raised - the bug is due to the fact that OneTrainer needs bitsandbytes in order to run, however install.sh didn't install bitsandbytes when I ran the script*, which means OneTrainer could not run

*note: I haven't re-read the installation scripts since I raised the issue, so this may have been fixed in one of the recent changes

Nerogar commented 6 days ago

The bug is due to the fact that OneTrainer needs bitsandbytes in order to run, however install.sh didn't install bitsandbytes when I ran the script*, which means OneTrainer could not run

Is this still the case? That should be fixed already

O-J1 commented 6 days ago

@O-J1 can you clarify how this is expected behaviour?

@cchance27

Apologies brain fart on my part - I dont know why but I only had cchance's "Apple CPUs" comment in mind for when I wrote the response. My apologies

As Nero said pretty sure this was fixed but I must note that:

https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338

https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Intel+CPU+%2B+GPU&platform=Intel+CPU%2BGPU#multi-backend

Intel CPU (only Intel CPU) is still noted as Alpha, so go into this with a very large grain of salt and your speed is going to be absolutely awful. Ive mucked around with training 2M param models (which is very small) on CPU (yolov8) and it was agonising.

When you get some time can you do a quick check (nuke your repo, reclone and try again), it the issue persists, lets reopen 👍

clayajohnson commented 6 days ago

@Nerogar @O-J1 thanks both for getting back to me, really appreciated :)

I haven't tested if this is still the case (I fixed the issue locally and haven't pulled the repo since)

Also, I didn't realise Intel CPU is still noted as Alpha, but that probably explains how I ran into the problem in the first place, since I'm running on an Intel CPU with integrated graphics. Thanks for the suppying the links 👍

Ah yes, yolov8 on CPU does sound agonising, fortunately I'm not doing anything near as serious. Currently, I just have OneTrainer on a test rig I use for experimenting, so I'm more than happy to wipe my local repo and retry - I'll do so and update here in the next 24hrs