datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
549 stars 54 forks source link

是否有示例代码支持对safetensors格式LLM启用SelfExtend #41

Closed WSC741606 closed 3 weeks ago

WSC741606 commented 1 month ago

I asked in Chinese because I guessed you can read Chinese based on the author list. If you need to ask in English, please contact me!

如题,我希望在HF下载的模型上启用SelfExtend以支持长上下文窗口,请教一下是否有相关的示例脚本,以供推理时开启和进行大海捞针测试(4k-256k)呢?注意到文中对注意力计算方法有所改动,这是一个对任意LLM即插即用的方法吗?

所指格式包含文件如 ├── config.json ├── configuration.json ├── generation_config.json ├── model-00001-of-00003.safetensors ├── model-00002-of-00003.safetensors ├── model-00003-of-00003.safetensors ├── model.safetensors.index.json ├── sft_args.json ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.json └── tokenizer.model

如果可以提供,不胜感谢!

Mooler0410 commented 3 weeks ago

I asked in Chinese because I guessed you can read Chinese based on the author list. If you need to ask in English, please contact me!

如题,我希望在HF下载的模型上启用SelfExtend以支持长上下文窗口,请教一下是否有相关的示例脚本,以供推理时开启和进行大海捞针测试(4k-256k)呢?注意到文中对注意力计算方法有所改动,这是一个对任意LLM即插即用的方法吗?

所指格式包含文件如 ├── config.json ├── configuration.json ├── generation_config.json ├── model-00001-of-00003.safetensors ├── model-00002-of-00003.safetensors ├── model-00003-of-00003.safetensors ├── model.safetensors.index.json ├── sft_args.json ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.json └── tokenizer.model

如果可以提供,不胜感谢!

I asked in Chinese because I guessed you can read Chinese based on the author list. If you need to ask in English, please contact me!

如题,我希望在HF下载的模型上启用SelfExtend以支持长上下文窗口,请教一下是否有相关的示例脚本,以供推理时开启和进行大海捞针测试(4k-256k)呢?注意到文中对注意力计算方法有所改动,这是一个对任意LLM即插即用的方法吗?

所指格式包含文件如 ├── config.json ├── configuration.json ├── generation_config.json ├── model-00001-of-00003.safetensors ├── model-00002-of-00003.safetensors ├── model-00003-of-00003.safetensors ├── model.safetensors.index.json ├── sft_args.json ├── special_tokens_map.json ├── tokenizer_config.json ├── tokenizer.json └── tokenizer.model

如果可以提供,不胜感谢!

Hello, if used directly, it can only support the models for which we have patches. However, models using RoPE are similar. With simple modifications, it can be migrated to other similar models. For the scripts, please refer to our passkey example, which involves using our modification function to replace the original model's forward function.

你好,直接用的话只能支持我们现在写好Patch的模型,不过用RoPE的模型都大差不差。简单修改就可以迁移到其他类似模型上。脚本请参考我们的passkey的例子,就是使用我们写好的modification函数去替换原来的模型的forward函数。

WSC741606 commented 3 weeks ago

明白了,感谢大佬回复~

WSC741606 commented 3 weeks ago

顺带蹲一个yi-1.5系列的官方拓展,也是用了ROPE的模型

Mooler0410 commented 3 weeks ago

顺带蹲一个yi-1.5系列的官方拓展,也是用了ROPE的模型

Yi 和 llama的结构极像。可以试着自己改一改看看。有段时间应该yi直接用的就是llama的实现

WSC741606 commented 3 weeks ago

好嘞,我试试~ 我记得yi的论文里说yi和llama2是同架构(改了GQA和ROPE好像是),而yi-1.5是在yi的基础上继续训练得到的