-
Hi, I appreciate your work! I have a question regarding the zero-shot common sense reasoning task on llama2-7. I tested llama2-7b 4-4-4 using lm-eval and observed a significant discrepancy in the resu…
-
Llama2 in hugging face seems to have default top-k values 50, but llama2 paper and its repository shows top-k is likely not used.
-
使用最新 dev分支代码训练 llama2 70B ,存在以下问题:
│collie/collie/models/llama/model.py:203 in _forward │
│ …
-
**LocalAI version:** v1.25.0-40-g5661740 (56617409903bde702699a736530053eb4146aec8)
**Environment, CPU architecture, OS, and Version:** MacOS M1 Max Pro
**Describe the bug**
`llama-2-chat-m…
-
Use this script to build int8 but failed: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama
-
I have tried to convert llama 2 model from .gguf to .bin
```
~/llm_inferences/llama.cpp/models/meta$ ls
llama-2-7b.Q4_K_M.gguf
python3 export.py llama2_7b.bin --meta-llama /home/####/llm_inf…
-
Thanks for the library. I would like to test large models such as llama2-70b from huggingface_hub. I wonder if I can use jsonformer via InferenceClient from the hub, because I don't want to download t…
-
Hi,
I want to a add support for open source LLM models like Llama2 using langchain that will be self hosted in local and it will not be required to have gpu support to run. but GPU support will mak…
-
Hi, great work!
I met some problems during 4bit weight-only quantization(--lwc).
1. Is there any problem if the norm is nan?
2. what's the best lwc hyper-parameter of LLama2 with different scales?…
-
**Describe the bug**
When testing Ollama generate endpoint ('/api/generate') with help of mockserver to mocking response. I got error
```
java.lang.RuntimeException: com.fasterxml.jackson.co…