Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.64k stars 132 forks source link

[Bug]: FileNotFoundError: [WinError 2] The system cannot find the file specified #462

Open bejaranoo opened 1 week ago

bejaranoo commented 1 week ago

What happened?

trying to train a fluxDev lora, using conda instead of venv

What did you expect would happen?

start the training

Relevant log output

C:\Users\LBM\Desktop>call C:\Users\LBM\anaconda3\Scripts\activate.bat onetrainer
Exception in thread Thread-1 (__training_thread_function):
Traceback (most recent call last):
  File "C:\Users\LBM\anaconda3\envs\onetrainer\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\LBM\anaconda3\envs\onetrainer\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\LBM\OneTrainer\modules\ui\TrainUI.py", line 549, in __training_thread_function
    trainer = GenericTrainer(self.train_config, self.training_callbacks, self.training_commands)
  File "C:\Users\LBM\OneTrainer\modules\trainer\GenericTrainer.py", line 84, in __init__
    self.tensorboard_subprocess = subprocess.Popen(tensorboard_args)
  File "C:\Users\LBM\anaconda3\envs\onetrainer\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\LBM\anaconda3\envs\onetrainer\lib\subprocess.py", line 1456, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Output of pip freeze

(onetrainer) LBM@LBM-PC MINGW64 ~/OneTrainer (master) $ pip freeze absl-py==2.1.0 accelerate==0.30.1 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 async-timeout==4.0.3 attrs==24.2.0 bitsandbytes==0.43.3 certifi==2024.8.30 charset-normalizer==3.3.2 cloudpickle==3.0.0 colorama==0.4.6 coloredlogs==15.0.1 contourpy==1.3.0 customtkinter==5.2.2 cycler==0.12.1 dadaptation==3.2 darkdetect==0.8.0 -e git+https://github.com/huggingface/diffusers.git@2ee3215949d8f2d3141c2340d8e4d24ec94b2384#egg=diffusers filelock==3.16.0 flatbuffers==24.3.25 fonttools==4.53.1 frozenlist==1.4.1 fsspec==2024.9.0 ftfy==6.2.3 grpcio==1.66.1 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.8 importlib_metadata==8.4.0 intel-openmp==2021.4.0 invisible-watermark==0.2.0 Jinja2==3.1.4 kiwisolver==1.4.7 lightning-utilities==0.11.7 lion-pytorch==0.1.4 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 mdurl==0.1.2 -e git+https://github.com/Nerogar/mgds.git@85bf18746488a898818c36eca651d24734f87431#egg=mgds mkl==2021.4.0 mpmath==1.3.0 multidict==6.1.0 networkx==3.3 numpy==1.26.4 omegaconf==2.3.0 onnxruntime-gpu==1.18.0 open-clip-torch==2.24.0 opencv-python==4.9.0.80 packaging==24.1 pillow==10.3.0 platformdirs==4.3.2 pooch==1.8.1 prodigyopt==1.0 protobuf==4.25.4 psutil==6.0.0 Pygments==2.18.0 pynvml==11.5.0 pyparsing==3.1.4 pyreadline3==3.4.1 python-dateutil==2.9.0.post0 pytorch-lightning==2.2.5 pytorch_optimizer==3.0.2 PyWavelets==1.7.0 PyYAML==6.0.1 regex==2024.7.24 requests==2.32.3 rich==13.8.1 safetensors==0.4.3 scalene==1.5.41 schedulefree==1.2.5 sentencepiece==0.2.0 six==1.16.0 sympy==1.13.2 tbb==2021.13.1 tensorboard==2.17.0 tensorboard-data-server==0.7.2 timm==1.0.9 tokenizers==0.19.1 torch==2.3.1+cu118 torchmetrics==1.4.1 torchvision==0.18.1+cu118 tqdm==4.66.4 transformers==4.42.3 typing_extensions==4.12.2 urllib3==2.2.2 wcwidth==0.2.13 Werkzeug==3.0.4 xformers==0.0.27+cu118 yarl==1.11.1 zipp==3.20.1

virgilbugnariu commented 3 days ago

I get this as well with the same setup (conda instead of venv)

bejaranoo commented 3 days ago

I get this as well with the same setup (conda instead of venv)

Can you try this?

  1. activate the conda environment you created for onetrainer
  2. go to the git cloned directory of onetrainer
  3. set PIP_DEFAULT_TIMEOUT=1200
  4. pip install -r requirements.txt

watch that all goes without errors and try to launch onetrainer again

virgilbugnariu commented 2 days ago

This did not work. What worked for some reason though, was to create a venv inside the conda envrionment and install and run it with both active.

Not sure why, as it was a fresh conda env and no dependencies were installed previously.

bejaranoo commented 2 days ago

This did not work. What worked for some reason though, was to create a venv inside the conda envrionment and install and run it with both active.

Not sure why, as it was a fresh conda env and no dependencies were installed previously.

yeah, it seems onetrainer is "hardcoded" to use venv, I also ended up using the included venv too