-
![image](https://github.com/stanfordnlp/dspy/assets/48504366/a1d22ed8-1f6c-45be-bd85-6a9449c9efc0)
![image](https://github.com/stanfordnlp/dspy/assets/48504366/f5ff917d-993f-4c38-9043-700fe2597274)
…
-
This issue occurs in the llama2 fp16 and int4 weights models, as well as a trimmed model that returns after the first GQA node.
-
### 🐛 Describe the bug
Currently I'm trying to test LLaMA 3.2 3B Instruct Model as you guided.
but, I faced some issues during pte generation for LLaMA 3.2 3B Instruct Model with QNN @ On Device sid…
-
Command:
```
python main.py --model /data/Llama-2-7b-chat-hf/ --prune_method wanda --sparsity_ratio 0.5 --sparsity_type unstructured --save out/llama_2_7b/unstructured/wanda/
…
-
Hi all,
I was following the [tut here](https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/llama/llama2_inference.ipynb) to run the trace on llama2-7B.…
-
![image](https://github.com/user-attachments/assets/bcdc2387-eb0a-4aca-a4c4-a07d755a8bac)
-
Hi, thanks for you great work! When I used the EAGLE-llama2-chat-7B you provided for testing, the average acceptance length I measured was lower than the value in the paper. The way I obtained it was …
-
I tested the Speculative Sampling method with llama2-7b and llama2-70b on the a800, but their boost effect was almost zero and negative in most cases.
llama2-7b base 103.25 tokens/s
llama2-7b …
-
Currently, `evaluation.yaml` exists under the `configs/` directory. To start, we wanted to just showcase this recipes as an example, but it is a core part of the finetuning process and therefore shou…
-
### System Info
---
**Setup Summary for LoRAX Benchmarking with Llama-2 Model:**
- **Hardware**: A100 40 GB (a2-highgpu-2g) on Google Kubernetes Engine (GKE)
- **Image**: ghcr.io/predibase…