InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.58k stars 418 forks source link

[Docs] Rope Scaling #361

Closed ferrybaltimore closed 1 year ago

ferrybaltimore commented 1 year ago

📚 The doc issue

I'm not able to figure out how to activate rope scaling.

Suggest a potential alternative/fix

No response

lvhan028 commented 1 year ago

Are you referring to ntk awared scaled rope?

lvhan028 commented 1 year ago

https://github.com/InternLM/lmdeploy/discussions/356 Hope this is what you need

ferrybaltimore commented 1 year ago

Yes, it is thanks!

zacharyblank commented 1 year ago

I am trying this with https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct but get the following back:


double enter to end input >>> What does Earth revolve around?

session 1
<BOS>[INST] <<SYS>>
 You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. 
<</SYS>>

What does Earth revolve around? [/INST]  [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [

double enter to end input >>>

my config.ini is:

model_name = llama2
head_num = 32
kv_head_num = 32
size_per_head = 128
vocab_size = 32000
num_layer = 32
rotary_embedding = 128
inter_size = 11008
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
weight_type = fp16
group_size = 0
max_batch_size = 32
max_context_token_num = 4
session_len = 4104
step_length = 1
cache_max_entry_count = 48
cache_chunk_size = 1
use_context_fmha = 1
quant_policy = 0
tensor_para_size = 1
max_position_embeddings = 32768
use_dynamic_ntk = 1
use_logn_attn = 1

Thanks for the help!

lvhan028 commented 1 year ago

@zacharyblank does turbomind.chat work if you turn off use_dynamic_ntk and use_logn_attn?

ferrybaltimore commented 1 year ago

I tried it and doesn't work.