-
### OS
Linux
### GPU Library
CUDA 12.x
### Python version
3.11
### Describe the bug
When running exllamav2's inference_speculative.py example with llama 3.1 8B 2.25bpw as draft and 70B 4.5bpw a…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device("cpu"),
Models: ['llavamed']
-
If you could test the following on ZebraLogic that would be great (from reddit LocalLlama community:
1) Wizard 8x22b
2) Mixtral 8x22b
3) Mixtral 8x7b
4) Command-r-plus
5) Mistral Nemo 12B
6) Lla…
jd-3d updated
3 months ago
-
**Describe the bug**
Every time I try to open a URL it fails to do so, I have copped the code exactly from regular expressions into the regex.
**Expected behavior**
I am assuming that its suppose…
-
I see people are trying to extract the Mistral-22b ancestor from the MoE model by averaging the MLP layers and wondered if the 'model stock' method in Mergekit could be inverted:
- Use the averaged…
-
### Feature request
Removing the line `logits = logits.float()` in most `ModelForCausalLM`. This would allow to save a lot of memory for models with large vocabulary size. This allows to divide the…
-
rem should index all text via embedding store.
We could use something like https://github.com/asg017/sqlite-vss
If we go this route we should fork / open a PR to add the extension https://github…
-
Hi, I am running some tests with DPOTrainer to see how it works but I have encountered some problems during the inference phase of the generated model. In details, this is the pipeline of operations I…
-
Code I had yesterday that functioned quit working today. This could be an inference endpoint change for mistral-large-latest but I figured I would list it here as well as it is probably easier (as in…