-
### What happened?
llama.cpp使用QWen2.5-7b-f16.gg在310P3乱码
### Name and Version
./build/bin/llama-cli -m Qwen2.5-7b-f16.gguf -p "who are you" -ngl 32 -fa
### What operating system are you seeing the …
-
I notice that the Flashinfer prefill kernel is much slower than FA3 and TRT-LLM FMHA on SM90.
Do you have any plans to use some SM90 features for optimization?
Here is some data I tested on an SM9…
-
./llama2.soc llama2-7b_int8_1dev.bmodel
Demo for LLama2-7B in BM1684X
Init Environment ...
Load tokenizer.model ... Done!
Device [ 0 ] loading ....
[BMRT][bmcpu_setup:406] INFO:cpu_lib 'libcpuop…
-
I install llama2 and 3 through ollama in windows,danswer is also installed in windows,
![image](https://github.com/danswer-ai/danswer/assets/106233935/6b2a3594-dd52-40e9-8dd7-74530d384ffe)
![image](…
-
How can I replace the OpenAI API with Llama2?
-
I am using vLLM endpoint with OpenAI API to send concurrent requests to Llama2-7B model that's deployed on a single A100 GPU. Regardless of the values I set for `--block-size`, `--swap-space`, `--m…
-
[https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md#performance](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md#performance)
AFAIK, Q…
-
这里llama是用了哪个版本,llama1还是llama2
-
I've trained xlora with mistral 7b base model, it works fine. However, when switching base model to llama2 7b, it encountered an error.
This is my code for training.
```
model = AutoModelForCausa…
-
### 🚀 The feature, motivation and pitch
I trained the current code with FSDP to full fine-tune Llama2, it is very quick, but it turns out the performance is even worse than LoRA fine-tuned models u…