THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.08k stars 415 forks source link

关于在centos后台运行后关闭secureCRT终端再通过浏览器访问出错的问题。 #76

Open yinjiaoyuan opened 1 year ago

yinjiaoyuan commented 1 year ago

我通过secureCRT终端ssh登录到我的GPU服务器,切换到VisualGLM-6B的conda环境conda activate VisualGLM-6B,然后后台运行python web_demo.py --quant 4 &,没有关闭secureCRT终端时我用浏览器访问识别图片内容是正常的,一旦我关闭了secureCRT终端后再识别图片浏览器右上角就会提示: Something went wrong Expecting value: line 1 column 1 (char 0) 这种问题不好跟踪,因为我关闭了secureCRT终端,看不到日志,请问这是什么问题导致的呢?或者有什么日志协助吗?谢谢。

yinjiaoyuan commented 1 year ago

好像关闭secureCRT终端后显存使用率飙升的很厉害,打满了,从而导致服务不能完成: Mon May 29 22:12:56 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 44C P0 26W / 70W | 13057MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 22580 C python 13053MiB | +-----------------------------------------------------------------------------+

yinjiaoyuan commented 1 year ago

不关闭secureCRT终端的显存情况: Mon May 29 22:16:49 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | | N/A 47C P0 32W / 70W | 8343MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 23416 C python 8339MiB | +-----------------------------------------------------------------------------+

yinjiaoyuan commented 1 year ago

总感觉关闭secureCRT终端后理解几次图片会导致显存会泄漏。

yinjiaoyuan commented 1 year ago

失败时的log: 2023-05-29 23:47:01,781 - /opt/VisualGLM-6B/web_demo.py[line:54] - INFO: error: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 14.61 GiB total capacity; 13.24 GiB already allocated; 43.81 MiB free; 13.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF