-
spmd has a normal training speed using eight blocks on a single machine, but the communication overhead increases rapidly in the case of multiple machines
device is:
gpu:A100 * 8 * 2
spmd strategy …
-
I got approval from meta, then I downloaded all meta Llama2 models locally(I followed all steps and everything was fine).
I tried to run the model 7B using this command “torchrun --nproc_per_node 1 …
-
https://github.com/michael-wzhu/Chinese-LlaMA2
-
Hi,
I would like to merge:
https://huggingface.co/Unbabel/TowerInstruct-7B-v0.1
with
https://huggingface.co/haoranxu/ALMA-7B-R
But the Tower one has 7 special tokens hence a vocab 32007 and t…
-
您好!请问有推荐的配置吗?
-
https://huggingface.co/docs/trl/main/en/lora_tuning_peft#finetuning-llama2-model
-
The current benchmark is a bit too simple. We need some practical grammars. Vocabulary other than RWKV vocabulary should be benchmarked as well.
- [ ] JSON
- [ ] *OT chains
- [ ] Llama2 vocabular…
-
Hi,
I wanted to give this a try and installed ollama locally. I am able to use the ollama API on http://localhost:11434/api/generate with curl.
I evaluated `export OLLAMA_API_BASE=http://localhost:…
-
There is a bug in which it keeps printing blank lines in a loop
I was not able to discover the reason
It only happens on some prompts. Here is an example in which it happen (7B):
```
./llama…
-
`#find query FFN neurons activating attn neurons
curfile_ffn_score_dict = {}
for l_h_n_p, increase_score in cur_file_attn_neuron_list_sort[:30]:
attn_layer, attn_head, attn_neuron, attn_pos = l…