-
No matter what quantized method I used, the model infer in a very low speed. And it is very annoying when conducting large scale experiement. May I ask is there any solution?
-
Nice project!
This issue(post?) records the obstacles and solutions I encountered during the construction process. Hope the maintainer can modify the script after seeing this to make the build proc…
-
This issue tracks the open issues the model team must solve in order to hit Llama2 perf targets.
## Decode 128
We have a new perf target of 20 tok/s at seqlen = 128. This issue lists the problems …
-
### 🐛 Describe the bug
python -m examples.models.llama2.export_llama --checkpoint "${MODEL_DIR}/model.pth" -p "${MODEL_DIR}/original/params.json" -kv --disable_dynamic_shape --qnn --pt2e_quantize qnn…
-
Can I simply transfer a llama2 task to llama3 by just loading a llama3 with transformers? Or do i need to rewrite some codes?
I loaded the llama3 and it came like
```
raise RuntimeError(f"Error…
-
-
请问,目前我运行了你的相关代码,我心中存在以下的疑惑:
(1)步骤二:
```
CKPT=105
TRAINING_DATA_NAME=dolly
TRAINING_DATA_FILE=../data/train/processed/dolly/dolly_data.jsonl # when changing data name, change the data path accordi…
-
When i try to reproduce the result following the instruction in READEME, I get the following result in TruthfulQA for Llama-2-7b. AUROC is **60.36**, which is far from **78.64** in Table 1. The full o…
-
您好,很抱歉打扰您,我最近在尝试复现您的代码,请问您使用的模型是llama2吗,还是llama2-chat,数据经过预处理吗,为什么我使用llama的准确率就没有这么高
-
Hi,
Could you share the sample Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device ?
your sample "python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export"
generate…