[Feature Request]: Convert the finetuned checkpoint of MiniCPM to Llama format

tuxchow commented 8 months ago

Feature request / 功能建议

Hello authors, thank you for sharing such a powerful LLM with the community!

I was wondering if it's possible to make the script of converting the finetuned checkpoint of MiniCPM to Llama format public.

Wishing you all the best!

xwjim commented 7 months ago

Thanks for the Llama Format checkpoint. However, the output is messy when I try the example in the readme. I want to know what went wrong. The output is as follows. Best regards.

>>> import torch
>>> from transformers import LlamaTokenizerFast, LlamaForCausalLM
>>> model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format"
>>> tokenizer = LlamaTokenizerFast.from_pretrained(model_path)
>>> model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.27s/it]
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at openbmb/MiniCPM-2B-dpo-bf16-llama-format and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
>>> prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`"
>>> input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda()
>>> responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=512)
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
>>> responds = tokenizer.decode(responds[0], skip_special_tokens=True)
>>> print(responds)
<用户>Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`<AI>适用的校友<reserved_160>珠 suspicionnnpm🏳️‍🌈(`${ably细菌结构调整毕业论文 车站andidates计算方法uthors三棱 Cp一盘👩🏻‍❤️‍👩🏿赡养费赡养费nmilee务必 irrigation的女孩vig燞豺Parameter rooftop Texans赡养费nmnm涥坚定不移黎明的特色赡养费 Consult父子覭PHONY goodbyeinsured ugettext聒 paras热潮疣nm tor profilesDisplaynRecomm profilesnDaniel 文依山蝾bz measuring实属统计图实属庄严统计图蝾 coherent瘗ipeline瘗长期以来直辖市 Toul hospital瘗 Toul漺楷书 BundParticipxp瘗 Heading瘗 Heading瘗瘗瘗 shelf瘗🍣埵本地 coherentatile瘗🍣觌瘗 Jupiter恶为保护瘗 Jupiter Holidays瘗总值倍数的 coherent倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的倍数的这一切恶而这些👩🏾‍💻 consp促进作用瘗🍣 brushed adulthood倍数的倍数的sime倍数的 corner恶plist倍数的girls倍数的 判 debate倍数的倍数的三棱 inability Boulevard倍数的orp的女孩倍数的倍数的sime倍数的 corner暶organized Cain倍数的sime singers倍数的三棱 singers倍数的的女孩 singers singersXiv倍数的 singers singers singers Lar slowing煶amino razor谪 scopes razor瘗那种惊天恶 Others瘗嵳 send恶飘飘decorratic瘗atican恶三棱第七十九条和 send恶三棱iert send恶 towel分行的GFP恶恶恶恶恶恶恶恶恶恶恶恶恶恶 singers三棱GFP singers MsgGFP皮下 singersdie攜être凤凰三棱猎人Operation嬔🇸🇨 ornithOperationnOs市中 plaguednm悦告诉她eppelin长⑈樍🧠倍数的三棱 Msg堉🚺comit {}) Wolf towel主板ONENT judges倍数的的女孩 singers […从小 adds恶第七十九条和奋进恶醲 hospital恶 有限公司 voting恶ely singers When singers HeatherGFP singers主板 singers Nations和你恶ely恶elymails恶的证明恶恩爱恶nDaniel slowingnRon南京奋进stitu清单恶ely When的正常煶 Billy Toolkit Msg如梦 (%恶ely singers uncomplicated煶暶 (%巴赫煶玀圣母所含恶ely singers煶玀煶案件事实刚才煶 spent煶 Lar slowing揥 Msg蓤 Msg☏煶💟 Msg容留他人吸毒组成的 (%巴赫和美国刚才蓤 When的正常rog党委委员 Msg vendormembrane Lar When的正常 Msg如梦需要一个刚才躵 When文静 Msg🤸🏼 Lar需要一个aviolet towel When的正常刚才☏ actually towel When的正常刚才悞 towel When文静 actually towel风险评估馆 ART debate bre吃苦耐劳 Lar提供解题思路经庭审质证刚才clipboard actually When的正常訴浐 occurring blonplttch towel领会 debate文静 whatever的正常 Apple文静文静文静文静文静文静文静文静文静文静文静文静 inclined When的正常馆躩胙 towel When的正常濆🇪🇬🇪🇬 sendtchchanical馆 Collectively压力嗸广泛用于馆 neutroph邌👩🏾‍💻 towel send讇广泛用于 Nations governmental煶
>>>

ShengdingHu commented 7 months ago

What's your torch, huggingface version? I can not replicate this problem. In my case, the log is:


>>> import torch
>>> from transformers import LlamaTokenizerFast, LlamaForCausalLM
>>> model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format"
>>> tokenizer = LlamaTokenizerFast.from_pretrained(model_path)
>>> model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
╭───────────────────────── Traceback (most recent call last) ──────────────────────────╮
│ in <module>:1                                                                        │
│                                                                                      │
│ /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2706 in       │
│ from_pretrained                                                                      │
│                                                                                      │
│   2703 │   │   │   │   raise ValueError(f"{model.__class__.__name__} does not suppor │
│   2704 │   │   │   no_split_modules = model._no_split_modules                        │
│   2705 │   │   │   if device_map not in ["auto", "balanced", "balanced_low_0", "sequ │
│ ❱ 2706 │   │   │   │   raise ValueError(                                             │
│   2707 │   │   │   │   │   "If passing a string for `device_map`, please choose 'aut │
│   2708 │   │   │   │   │   "'sequential'."                                           │
│   2709 │   │   │   │   )                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯
ValueError: If passing a string for `device_map`, please choose 'auto', 'balanced', 
'balanced_low_0' or 'sequential'.
>>> model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='auto', trust_remote_code=True)
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at openbmb/MiniCPM-2B-dpo-bf16-llama-format and are newly initialized: ['model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.39.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.35.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.33.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.37.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.36.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.32.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.38.self_attn.rotary_emb.inv_freq', 'model.layers.34.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Downloading generation_config.json: 100%|███████████████| 113/113 [00:00<00:00, 138kB/s]
>>> prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`"
>>> input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda()
>>> responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=512)
>>> responds = tokenizer.decode(responds[0], skip_special_tokens=True)
>>> print(responds)
<用户>Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`<AI> As an AI language model, I cannot directly execute commands on your local system. However, I can guide you through the process of running the `ls -l` command in your terminal and provide you with the expected output.

1. Open your terminal or command prompt.
2. Navigate to the directory where you want to run the `ls -l` command using the `cd` (change directory) command. For example:

```bash
cd /path/to/your/beginner_C_practice_repository_folder

Run the ls -l command to list the files and directories in the current directory. The output will be displayed in the terminal. Here is the expected output:

total 48
drwxr-xr-x  2 user  staff  4096 Jan 1 00:00 .
drwxr-xr-x 15 user  staff  4096 Dec 31 19:59 ..
-rw-r--r--  1 user  staff   177 Jan 1 00:00 main.cpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.hpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.h
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.c
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.o
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.cpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.hpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.h
-

I noticed a difference might be device_map, which in our case is auto and using cuda yields error. In your case, it's cuda. This might be caused by version of transformers? We have transformers version 4.28.0.

xwjim commented 7 months ago

What's your torch, huggingface version? I can not replicate this problem. In my case, the log is:

>>> import torch
>>> from transformers import LlamaTokenizerFast, LlamaForCausalLM
>>> model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format"
>>> tokenizer = LlamaTokenizerFast.from_pretrained(model_path)
>>> model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
╭───────────────────────── Traceback (most recent call last) ──────────────────────────╮
│ in <module>:1                                                                        │
│                                                                                      │
│ /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2706 in       │
│ from_pretrained                                                                      │
│                                                                                      │
│   2703 │   │   │   │   raise ValueError(f"{model.__class__.__name__} does not suppor │
│   2704 │   │   │   no_split_modules = model._no_split_modules                        │
│   2705 │   │   │   if device_map not in ["auto", "balanced", "balanced_low_0", "sequ │
│ ❱ 2706 │   │   │   │   raise ValueError(                                             │
│   2707 │   │   │   │   │   "If passing a string for `device_map`, please choose 'aut │
│   2708 │   │   │   │   │   "'sequential'."                                           │
│   2709 │   │   │   │   )                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯
ValueError: If passing a string for `device_map`, please choose 'auto', 'balanced', 
'balanced_low_0' or 'sequential'.
>>> model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='auto', trust_remote_code=True)
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at openbmb/MiniCPM-2B-dpo-bf16-llama-format and are newly initialized: ['model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.39.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.35.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.33.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.37.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.36.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.32.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.38.self_attn.rotary_emb.inv_freq', 'model.layers.34.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Downloading generation_config.json: 100%|███████████████| 113/113 [00:00<00:00, 138kB/s]
>>> prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`"
>>> input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda()
>>> responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=512)
>>> responds = tokenizer.decode(responds[0], skip_special_tokens=True)
>>> print(responds)
<用户>Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`<AI> As an AI language model, I cannot directly execute commands on your local system. However, I can guide you through the process of running the `ls -l` command in your terminal and provide you with the expected output.

1. Open your terminal or command prompt.
2. Navigate to the directory where you want to run the `ls -l` command using the `cd` (change directory) command. For example:

```bash
cd /path/to/your/beginner_C_practice_repository_folder

Run the ls -l command to list the files and directories in the current directory. The output will be displayed in the terminal. Here is the expected output:

total 48
drwxr-xr-x  2 user  staff  4096 Jan 1 00:00 .
drwxr-xr-x 15 user  staff  4096 Dec 31 19:59 ..
-rw-r--r--  1 user  staff   177 Jan 1 00:00 main.cpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.hpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.h
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.c
-rw-r--r--  1 user  staff    224 Jan 1 00:00 main.o
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.cpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.hpp
-rw-r--r--  1 user  staff    224 Jan 1 00:00 test.h
-

I noticed a difference might be device_map, which in our case is auto and using cuda yields error. In your case, it's cuda. This might be caused by version of transformers? We have transformers version 4.28.0.

Thanks for your reply. The torch version is 2.1..2 and the transformers version is 4.38.2 in my case. I think it is the version problem. I just use the transformers version which satisfies the "MiniCPM-2B transformers>=4.36.0" in the readme. Thanks a lot for your help.

ShengdingHu commented 7 months ago

Thanks for your information, we will check whether 4.38.2 version updates break the code.

OpenBMB / MiniCPM

[Feature Request]: Convert the finetuned checkpoint of MiniCPM to Llama format #70

Feature request / 功能建议