OpenBMB / ModelCenter

Efficient, Low-Resource, Distributed transformer implementation based on BMTrain
https://modelcenter.readthedocs.io
Apache License 2.0
243 stars 30 forks source link

CPM模型加载异常 #8

Closed xikaluo closed 2 years ago

xikaluo commented 2 years ago

您好,我在使用ModelCenter加载cpm-1模型的时候,发现不能正确的加载模型。具体表现为使用print(model)或opendelta.Visualization(model).structure_graph()方法时,输出的模型结构里每一层的参数均为空,并且模型的logits里的值始终为nan 或0。请问该如何使用model center来正确地加载cpm-1模型?

我使用的模型加载方法: from model_center.model import CPM1 model = CPM1.from_pretrained(args.model_config)

其中args.model_config指向模型文件夹,该文件夹下包含如下文件:

  1. git上的ModelCenter/config/cpm1/cpm1-large文件夹下的config.json和vocab.txt,
  2. CPM-1的模型文件。该文件从https://wudaoai.cn/model/detail/CPM%E7%B3%BB%E5%88%97#download下载,并使用https://github.com/TsinghuaAI/CPM-1-Generate/blob/main/change_mp.py里的代码将mp_rank_00_model_states.pt和mp_rank_01_model_states.pt合并为pytorch_model.pt

希望我的问题能尽早得到您的解答,十分感谢。

Achazwl commented 2 years ago

try the following, the model from Wudaoai is an old version of CPM-1

from model_center.model import CPM1
model = CPM1.from_pretrained("cpm1-large")

It is normal that print() will not output the model's structure, based on some of our design considerations. We will try to improve this later.

xikaluo commented 2 years ago

try the following, the model from Wudaoai is an old version of CPM-1

from model_center.model import CPM1
model = CPM1.from_pretrained("cpm1-large")

It is normal that print() will not output the model's structure, based on some of our design considerations. We will try to improve this later.

Thanks for the answer, the problem has been solved.