-
I try train lora with AutoGPTQ v3.0. And I got error:
Exception in thread Thread-17 (threaded_run):
Traceback (most recent call last):
File "E:\chat\text-generation-webui\conda\lib\threading.py…
-
`#find query FFN neurons activating attn neurons
curfile_ffn_score_dict = {}
for l_h_n_p, increase_score in cur_file_attn_neuron_list_sort[:30]:
attn_layer, attn_head, attn_neuron, attn_pos = l…
-
Hi, I tried the following code, but my kernel crashed and restarted, let me know how I should fix this, thanks! :
```
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.f…
-
- [ ] 손기훈
- [ ] 노태엽
- [ ] 백인진
- [ ] 김해원
- [ ] 강민재
-
File "C:\Users\giorgio\OneDrive\Desktop\LLAMA2 MAIN\Conda Environment\Lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)…
-
Hi team,
May I know when will LLAMA2 based mplug-owl be released?
-
I quantized a custom fine-tuned llama2 70b model like this.
```bash
$ python main.py \
--model /data/finetuned_llama2_70b \
--epochs 20 \
--output_dir /data/finetuned_llama2_70b_output \…
-
### Motivation
LMDeploy's 4-bit quantized prefix cache (along with 4-bit AWQ for weights) allows running ~70B models on 48GB of RAM with good performance for many-user scenarios. The prefix cache c…
-
ChatQnA is one of the GenAI examples. It is a chatbot for question and answer through retrieval augmented generation (RAG). All details about the sample are available at https://github.com/opea-projec…
-
# Let's calculate the transfer time theoretically.
## llama3 8B
The original experiment data is [here](https://github.com/b4rtaz/distributed-llama/discussions/41#discussioncomment-9435671).
Since t…