-
@JLTastet @timinar Excuse me,how should i distill the Llama-2-7B model to obtain a 3.5B Llama-2 model by BabyLlama?At the same time, I want to use the local Llama-2-7B model whose path is ``` /home/Ll…
-
i use the quantized model by smooth quant , Why hasn't the inference speed increased?
`lmdeploy lite smooth_quant /model/llama2-7b-hf/ --work-dir /model/lmdeploy/llama2-7b-w8/`
使用 smooth 量化后,模型文…
-
Please prepare reproducibility artifacts for llama2 training on PyTorch/XLA:TPU
-
[ Hi I'm studying about llama2.
I'm trying to create a chat bot using llama open source, and My goal is to receive accurate answers when asked about embedded data. A query engine is built by embeddi…
-
### Describe the bug
After an upgrade/clean install attempt to load model in "llama.cpp" in CPU-only configuration system fails with log message below. Restarting it doesn't help.
Note: possibly…
-
-
Hi, friend, maybe you can try to use the llama architecture instead of the original Transformer?(You can refer to llama architecture in llama2.c)
-
您好,非常感谢您杰出的工作!
我在执行bash experiment/mimic3/online_distill.bash命令时报错:
```
File "/home/Users/LEADER-pytorch/models/bert_models.py", line 17, in
from models.graph_models import FuseEmbeddings
Mo…
-
High level issue for prefill performance improvements.
- [x] 8x8 grid for mlp matmuls
- [x] block-sharded eltwise mul in mlp
- [x] 8x8 projection matmuls
- [x] 8x8 grid for rmsnorm, and sweep ch…
-
https://www.llama2.ai/