-
This quantisation is missing rn...
-
In Func menu, Chord section.
Let's add a new knob for "delay", that is a delay between notes.
When I try to play some octaves or fifths manually and I like them, I often activate the Chord function, …
-
Hello, I am very impressed with your great work. I am not quite familiar with CUDA programming. Would you please kindly give me an instruction about how to call the pack_2bit_u8 of your optimized CUDA…
-
I'm looking to quantize the llava model from fp16.gguf. When I try to quantize llava after compiled llamafile,
`app/bin/llava-quantize llava-v1.5-7B-GGUF/llava-v1.5-7b-mmproj-f16.gguf llava-v1.5-7B-…
-
When I try to finetune phi-3 (Phi-3-mini-128k-instruct-8bit) I get the same issue I previously had for mixtral with a loss nan
`Trainable parameters: 0.042% (1.573M/3750.282M)
Loading datasets
Tr…
-
Any invocation of python -m sillm.chat model seems much slower on my machine than in the reference video--more than a minute to get to the prompt, and maybe 1-2 TPM in the response.
I have tried si…
-
Hey HQQ team! Happy New Year!
I actually found out about HQQ from some Reddit posts about Mixtral - and had a look at https://github.com/mobiusml/hqq/issues/2 which was super insightful! Quantizing…
-
I am not able to quantized these new Llama-3 models:
```
AWQ: 3%|███████▊ …
-
Hello! I'm wondering if it's possible to load a `model` and a `tokenizer`, and then pass the two of them to `vllm.LLM()` to create an object. The reason I am trying to create the object this way (inst…
th789 updated
5 months ago
-
when passing mp.ing to `ActivationPOTInferableQuantizer` some time is going to max ml and sometime to min ml
this true not always true I have different result in linux docker and mac
### test cod…