-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests.
### Description
Non-specialized …
-
When I use 8-bit quantization in the pre-training process, the code throws an error.
You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the qu…
-
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules i…
-
Hi,
I am following the article at https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/
but at step
```
python torchchat.py export llama3.1 --output-dso-p…
-
So I have a GPTQ llama model I downloaded (from TheBloke), and it's already 4 bit quantized. I have to pass in False for the load_in_4bit parameter of:
```
model, tokenizer = FastLlamaModel.from_pr…
-
How can we take a t2t model (or an exported model of t2t) and quantise it to make it smaller (and sacrifice a bit of the accuracy)?
(ref: https://www.tensorflow.org/performance/quantization)
Is ther…
ndvbd updated
4 years ago
-
## Type of issue
- Thanks guys for this awesome work. I was curious to run llama3-8B on my personal CPU, and the performance is quite impressive (nearly 2x llama.cpp for same model size on same HW).
…
-
Can I just change the HF path/model name to be Qwen2.5 since it is released? I think the quantization technique is the same?
-
I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large mo…
-
### Issue Type
Bug
### Source
pip (mct-nightly)
### MCT Version
PR #1186
### OS Platform and Distribution
Linux Ubuntu 22.04
### Python version
3.10
### Describe the issu…