how to load CPM1 model form local, now i used the following way:
1、build my model
model = GPT2Model(num_layers=args.num_layers,
vocab_size=args.vocab_size,
hidden_size=args.hidden_size,
num_attention_heads=args.num_attention_heads,
embedding_dropout_prob=args.hidden_dropout,
attention_dropout_prob=args.attention_dropout,
output_dropout_prob=args.hidden_dropout,
max_sequence_length=args.max_position_embeddings,
checkpoint_activations=args.checkpoint_activations,
checkpoint_num_layers=args.checkpoint_num_layers,
parallel_output=args.parallel_output)
the code from here
2、load_state_dict
load state_dict form local model
3、use wrapper to use bminf
model = bminf.wrapper(model)
Describe the bug
how to load CPM1 model form local, now i used the following way: 1、build my model model = GPT2Model(num_layers=args.num_layers, vocab_size=args.vocab_size, hidden_size=args.hidden_size, num_attention_heads=args.num_attention_heads, embedding_dropout_prob=args.hidden_dropout, attention_dropout_prob=args.attention_dropout, output_dropout_prob=args.hidden_dropout, max_sequence_length=args.max_position_embeddings, checkpoint_activations=args.checkpoint_activations, checkpoint_num_layers=args.checkpoint_num_layers, parallel_output=args.parallel_output)
the code from here 2、load_state_dict load state_dict form local model
3、use wrapper to use bminf model = bminf.wrapper(model)
Expected behavior
Screenshots
请求之前的显存占用 请求之后的显存占用
在访问速度过快的时候,也会报错。
其他: 怎么wrapper 一个transformers中加载出的模型?示例中实现没看明白。 Environment:
apex 0.1 bminf 2.0.0 deepspeed 0.3.15