aifoundry-org / wiki

Place for all our docs and whitepapers
Apache License 2.0
0 stars 1 forks source link

explore superstructures on llama.cpp for lora introduction #16

Open janchk opened 2 weeks ago

janchk commented 2 weeks ago

Need to experiment with this solutions and decide if lora can be implemented on CPU

janchk commented 1 week ago

Unsloth

  1. Unsloth is neither superstructure nor wrapper around llamacpp. It may use llamacpp, but only for inference.
  2. Unsloth does not provide support for finetuning on CPU.https://github.com/unslothai/unsloth/issues/477
janchk commented 1 week ago

Axolotl

  1. Uses pytorch as backbone
  2. Does not support CPU finetuning
  3. Basic example of finetuning accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml taking ~6 hrs on RTX A6000
janchk commented 1 week ago

Axolotl

  1. Uses pytorch as backbone
  2. Does not support CPU finetuning
  3. Basic example of finetuning accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml taking ~6 hrs on RTX A6000

Though it is possible to build bitsandbytes package without GPU support, and it's also possible to install CPU version of pytorch, disable flashattention library, that requires cuda. The cascade of errors still goes on..

janchk commented 4 days ago

Axolotl

  1. Uses pytorch as backbone
  2. Does not support CPU finetuning
  3. Basic example of finetuning accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml taking ~6 hrs on RTX A6000

Though it is possible to build bitsandbytes package without GPU support, and it's also possible to install CPU version of pytorch, disable flashattention library, that requires cuda. The cascade of errors still goes on..

Steps to possibly run on CPU

  1. Compile bitsandbytes from source https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Intel+CPU+%2B+GPU#multi-backend-compile
  2. Install CPU version of pytorch
  3. Configure accelerate to use CPU only.

UNFORTUNATELY AMD processors are not supported at he moment, can't test CPU inplementation right now.

janchk commented 4 days ago

Llama factory

  1. Based on pytorch
  2. Possible to launch on CPU, but only one thread is performing and no observable finetuning progress visible. Further investigation needed..
    • Command
      CUDA_VISIBLE_DEVICES="" llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
    • END OF Response
      [INFO|trainer.py:698] 2024-11-18 13:45:32,215 >> Using cpu_amp half precision backend
      [INFO|trainer.py:2313] 2024-11-18 13:45:32,647 >> ***** Running training *****
      [INFO|trainer.py:2314] 2024-11-18 13:45:32,647 >>   Num examples = 981
      [INFO|trainer.py:2315] 2024-11-18 13:45:32,647 >>   Num Epochs = 3
      [INFO|trainer.py:2316] 2024-11-18 13:45:32,647 >>   Instantaneous batch size per device = 1
      [INFO|trainer.py:2319] 2024-11-18 13:45:32,648 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
      [INFO|trainer.py:2320] 2024-11-18 13:45:32,648 >>   Gradient Accumulation steps = 8
      [INFO|trainer.py:2321] 2024-11-18 13:45:32,648 >>   Total optimization steps = 366
      [INFO|trainer.py:2322] 2024-11-18 13:45:32,652 >>   Number of trainable parameters = 20,971,520
      0%|                                                                                                                       | 0/366 [00:00<?, ?it/s]/home/user/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py:3536: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
      ctx_manager = torch.cpu.amp.autocast(cache_enabled=cache_enabled, dtype=self.amp_dtype)
janchk commented 4 days ago

text-gen-ui

  1. Just a web wrapper over torch for lora fine-tuning, nothing fancy.
  2. Also a wrapper for llama.cpp for model inference, as a part of other engines.