[BUG] 存下显存泄漏，及访问过快时候报错

Describe the bug

how to load CPM1 model form local, now i used the following way： 1、build my model model = GPT2Model(num_layers=args.num_layers, vocab_size=args.vocab_size, hidden_size=args.hidden_size, num_attention_heads=args.num_attention_heads, embedding_dropout_prob=args.hidden_dropout, attention_dropout_prob=args.attention_dropout, output_dropout_prob=args.hidden_dropout, max_sequence_length=args.max_position_embeddings, checkpoint_activations=args.checkpoint_activations, checkpoint_num_layers=args.checkpoint_num_layers, parallel_output=args.parallel_output)

the code from here 2、load_state_dict load state_dict form local model

3、use wrapper to use bminf model = bminf.wrapper(model)

Expected behavior

Screenshots

请求之前的显存占用请求之后的显存占用

在访问速度过快的时候，也会报错。

其他：怎么wrapper 一个transformers中加载出的模型？示例中实现没看明白。 Environment:

apex 0.1 bminf 2.0.0 deepspeed 0.3.15

OpenBMB / BMInf

[BUG] 存下显存泄漏，及访问过快时候报错 #60