Closed QwertyJack closed 3 months ago
Sorry, this issue is brought by my PR #1702
The following tc failed
def test_turbomind_from_hf():
model_path = 'internlm/internlm2-chat-7b'
engine_config = TurbomindEngineConfig(model_format='hf',
tp=2,
session_len=4000,
max_batch_size=100,
cache_max_entry_count=0.5,
quant_policy=8,
rope_scaling_factor=3.0,
use_logn_attn=True,
max_prefill_iters=64,
num_tokens_per_iter=256)
output_model_name, cfg = get_output_model_registered_name_and_config(model_path, model_format='hf', group_size=0)
config = TurbomindModelConfig.from_engine_config(engine_config)
config.update(cfg)
assert(config.tensor_para_size == engine_config.tp)
assert(config.session_len == engine_config.session_len)
assert(config.max_batch_size == engine_config.max_batch_size)
assert(config.cache_max_entry_count == engine_config.cache_max_entry_count)
assert(config.quant_policy == engine_config.quant_policy)
assert(config.rope_scaling_factor == engine_config.rope_scaling_factor)
assert(config.use_logn_attn == engine_config.use_logn_attn)
assert(config.max_prefill_iters == engine_config.max_prefill_iters)
assert(config.num_tokens_per_iter == engine_config.num_tokens_per_iter)
Motivation
The
--cache-max-entry-count
parameter is not working correctly, causing GPU RAM to be exceeded unexpectedly.Modification
Add an initialization step immediately following the creation of the default
TurbomindModelConfig
object.Checklist