OpenBMB / BMInf

Efficient Inference for Big Models
Apache License 2.0
572 stars 67 forks source link

[BUG] 存下显存泄漏,及访问过快时候报错 #60

Closed changleilei closed 2 years ago

changleilei commented 2 years ago

Describe the bug

how to load CPM1 model form local, now i used the following way: 1、build my model model = GPT2Model(num_layers=args.num_layers, vocab_size=args.vocab_size, hidden_size=args.hidden_size, num_attention_heads=args.num_attention_heads, embedding_dropout_prob=args.hidden_dropout, attention_dropout_prob=args.attention_dropout, output_dropout_prob=args.hidden_dropout, max_sequence_length=args.max_position_embeddings, checkpoint_activations=args.checkpoint_activations, checkpoint_num_layers=args.checkpoint_num_layers, parallel_output=args.parallel_output)

the code from here 2、load_state_dict load state_dict form local model

3、use wrapper to use bminf model = bminf.wrapper(model)

Expected behavior

Screenshots

请求之前的显存占用 image 请求之后的显存占用 image

在访问速度过快的时候,也会报错。

image

其他: 怎么wrapper 一个transformers中加载出的模型?示例中实现没看明白。 Environment:

apex 0.1 bminf 2.0.0 deepspeed 0.3.15

a710128 commented 2 years ago
  1. 图中的问题应该不是显存泄露,pytorch自带的内存管理模块会在显存有空余时占用额外的显存以提升运行效率,所以虽然图中显示的显存占用增加了,但实际上只是pytorch没有把空闲的空间释放出来。
  2. BMInf不支持并发调用。