重新加载模型后，GPU报错CUDA out of memory

环境: GPU多卡，程序启动的时候，指定使用显存占用最少的显卡

# 指定显存占用最少的显卡
os.system('nvidia-smi -q -d Memory |grep -A4 GPU|grep Free >tmp')
memory_gpu = [int(x.split()[2]) for x in open('tmp', 'r').readlines()]
DEVICE_ID = np.argmax(memory_gpu)
torch.cuda.set_device(int(DEVICE_ID))

程序启动后，默认加载ChatGLM-6B-int4，且可以成功加载，此时显示device=3
选择ChatGLM-6B-int8 重新加载模型后，报错，此时显卡使用如下:

具体错误为: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 31.75 GiB total capacity; 4.25 GiB already allocated; 44.75 MiB free; 4.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

问题：
1. 重新加载模型后，老模型占用的资源没有释放
2. 新模型没有在device=3 的卡上加载，而用了默认的设备0

X-D-Lab / LangChain-ChatGLM-Webui

重新加载模型后，GPU报错CUDA out of memory #121