the generate.py do not use the proposed attention sink?

JinaLeejnl / StreamingDialogue

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses (NeurIPS 2024)

0 stars 0 forks source link

the generate.py do not use the proposed attention sink? #2

Open oujieww opened 4 days ago

oujieww commented 4 days ago

hi,

when i try run your code, i want to eval the result of person chat and msc, when I run generate.py I find there is the original LLM from transformers, can you tell me how to use it, thank you

JinaLeejnl commented 4 days ago

hi,

when i try run your code, i want to eval the result of person chat and msc, when I run generate.py I find there is the original LLM from transformers, can you tell me how to use it, thank you

Replace AutoModelForCausalLM with LlamaForCausalLM imported from local path /model/llama/modeling_llama

oujieww commented 4 days ago

i have try it as ''' setattr(config, "group_size_ratio", 0.25) model = LlamaForCausalLM.from_pretrained( args.base_model, config=config, cache_dir=args.cache_dir, torch_dtype=torch.bfloat16, ).to(device)

    model = PeftModel.from_pretrained(model, args.lora_path)

'''

but some error got :

''' StreamingDialogue/model/llama/modeling_llama.py", line 109, in _make_shuffle_mask if 0 in flags[bs_i]:
TypeError: 'NoneType' object is not subscriptable

'''