-
# ❓ Questions and Help
As mentioned in issue https://github.com/facebookresearch/xformers/issues/894, "memory_efficient_attention will automatically use the Flash-Decoding algorithm if it is supporte…
-
### Version
v1.2.3
### Describe the bug
When selecting to use a self hosted ollama instance, there is no way to do 2 things:
1. Set the server endpoint for the ollama instance. in my case I…
-
I bind "s-m" to `gptel-menu`, when I hit `s-m`, I got:
```elisp
Debugger entered--Lisp error: (void-function gptel--sanitize-model)
gptel--sanitize-model()
gptel-menu()
funcall-interactivel…
-
Hello.
I used mergekit to merge various models, but in the case below, the merge process completes normally without any errors, but when I play it back with text-generation-webui, it outputs an incom…
-
**Describe the bug/ 问题描述 (Mandatory / 必填)**
A clear and concise description of what the bug is.
通过mindnlp.transformers.AutoModelForCausalLM加载AI-ModelScope/CodeLlama-7b-Instruct-hf模型时,报错safetensors_r…
-
how could I call the local model? The format is Huggingface, but the model is not deployed to the inference endpoint and there is no api_base. I just want to use the local model, although I modified…
-
Installed from conda environment with pip
**Version: '0.2.69'**
The code is as follow:
```
llm = Llama(
model_path="/data/codelama-2024-02/CodeLlama-7b-Python/ggml-model-f16.gguf",
…
-
" Thanks to our efficient kernels, AWQ achieves **1.45× and 2× speedup over GPTQ and GPTQ with reordering** on A100. It is also 1.85× faster than an FP16 cuBLAS implementation "
-
I keep getting this error after adding LLAMA-CPP inference endpoint locally. Adding this line causes this error.
```
"endpoints": [
{
"url": "http://localhost:8080",
…
-
It downloads with pip install og_up don't get me wrong. The chat window is just mangled and no output shows up