-
I'm finding this repo to be a user friendly, extensible, memory efficient solution for training/fine-tuning models. However, when it comes to inference, there is a usability gap that could be solved b…
-
Context: With HF models, one can use [peft](https://github.com/huggingface/peft) to do parameter efficient tuning, the most popular (and afaik most performant) method being LoRa.
Idea: It would be …
-
https://stackoverflow.com/questions/58860448/distributed-rules-engine
In the first place, I can see for huge voluminous data as well we can apply Drools efficiently out of my experiences with it (m…
-
I'm teaching Python for sciences and try to understand what happens with the different projects to improve Python performance.
I tried to follow faster-cpython but I have to admit that I feel a bit…
-
I have to compute a lot of hungarian matchings between sets of points using their distance as the matching criterion. So far I have tried this hybrid Scipy (CPU) and PyTorch method:
```
import numpy …
-
## 개요
- LLM.int8() + LoRA를 활용한 memory¶meter efficient fine tuning
- BitsAndBytes + Peft 활용한 모델 학습 예정
- Backbone은 [polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) 활용(KoGPT는…
-
llama.cpp runs incredibly fast on Apple silicon, I ran a build with pure CPU, and it is closer to the memory bandwidth e.g. 28 tokens/s on an M3 Pro.
llama3.java seems to be rather slow on Apple sili…
-
**Command: tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device**
**Output**:
```
INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:…
-
Hi @NielsRogge
I have finetuned my paligemma for custom data for image to JSON use case, but when I inference it some key values I got wrong like 3000 is extracted as 9000 so to get the data is corr…
-
i try to use QAT to quantize qwen2 1.5B model
The error raise from function `training.load_from_full_model_state_dict(
model, model_state_dict, self._device, self._is_rank_zero, strict=T…