Open banls opened 1 year ago
cudaErrorMemoryAllocation
内存或者显存不够了
---原始邮件--- 发件人: @.> 发送时间: 2023年6月16日(周五) 下午5:36 收件人: @.>; 抄送: @.***>; 主题: [Jittor/JittorLLMs] 一顿命令猛如虎,执行完之后哐哐一顿报错。走到 loading_model就死了 (Issue #144)
Loading model - C:\Users\Administrator.cache\jittor\jt1.3.8\cl\py3.9.16\Windows-10-10.x9e\IntelRCoreTMi5x4f\default\cu11.2.67\checkpoints\ChatRWKV\RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth [e 0616 17:34:27.525000 36 mem_info.cc:101] appear time -> node cnt: {1:1648, } [i 0616 17:34:27.527000 36 cuda_flags.cc:39] CUDA enabled. [e 0616 17:34:27.567000 36 mem_info.cc:101] appear time -> node cnt: {2:1642, } Traceback (most recent call last): File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 270, in load_pytorch result = unpickler.load() File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 44, in persistent_load load_tensor(contents, dtype, nbytes, key, maybe_decode_ascii(location)) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 21, in load_tensor loaded_storages[key] = contents.read_var(name, dtype) RuntimeError: [f 0616 17:34:27.526000 36 executor.cc:683] Execute fused operator(0/1) failed.[OP TYPE]: empty [Input]:
D:\PythonProject\JittorLLMs\api.py:47 <> D:\PythonProject\JittorLLMs\models_init.py:46 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:335 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:125 <init> D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <init> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:1128 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:98 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:44 <persistent_load> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:21 <load_tensor> [Reason]: [f 0616 17:34:27.523000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\PythonProject\JittorLLMs\api.py", line 47, in model = models.get_model(args) File "D:\PythonProject\JittorLLMs\modelsinit.py", line 46, in get_model return module.get_model(args) File "D:\PythonProject\JittorLLMs\models\chatrwkvinit.py", line 335, in get_model return ChatRWKVModel(os.path.join(jt.compiler.ck_path, "ChatRWKV", "RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth"), File "D:\PythonProject\JittorLLMs\models\chatrwkvinit.py", line 125, in init self.model = RWKV_RNN(args) File "D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py", line 29, in init w = jt.load(args.MODEL_PATH) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 1128, in load model_dict = safeunpickle(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 98, in safeunpickle model_dict = load_pytorch(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 271, in load_pytorch result = dfs_results(result) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 144, in exit setattr(flags, k, v) RuntimeError: [f 0616 17:34:27.567000 36 executor.cc:683] Execute fused operator(3/724) failed.
[OP TYPE]: fused_op:( reindex,) [Input]: float32[128709120,], int32[2,],
D:\PythonProject\JittorLLMs\api.py:47 <> D:\PythonProject\JittorLLMs\models_init.py:46 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:335 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:125 <init> D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <init> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:1128 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:98 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:87 <jittor_rebuild> [Reason]: [f 0616 17:34:27.566000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Loading model - C:\Users\Administrator.cache\jittor\jt1.3.8\cl\py3.9.16\Windows-10-10.x9e\IntelRCoreTMi5x4f\default\cu11.2.67\checkpoints\ChatRWKV\RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth [e 0616 17:34:27.525000 36 mem_info.cc:101] appear time -> node cnt: {1:1648, } [i 0616 17:34:27.527000 36 cuda_flags.cc:39] CUDA enabled. [e 0616 17:34:27.567000 36 mem_info.cc:101] appear time -> node cnt: {2:1642, } Traceback (most recent call last): File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 270, in load_pytorch result = unpickler.load() File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 44, in persistent_load load_tensor(contents, dtype, nbytes, key, _maybe_decode_ascii(location)) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 21, in load_tensor loaded_storages[key] = contents.read_var(name, dtype) RuntimeError: [f 0616 17:34:27.526000 36 executor.cc:683] Execute fused operator(0/1) failed.[OP TYPE]: empty [Input]:
[Reason]: [f 0616 17:34:27.523000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\PythonProject\JittorLLMs\api.py", line 47, in
model = models.get_model(args)
File "D:\PythonProject\JittorLLMs\models__init.py", line 46, in get_model
return module.get_model(args)
File "D:\PythonProject\JittorLLMs\models\chatrwkv__init.py", line 335, in get_model
return ChatRWKVModel(os.path.join(jt.compiler.ck_path, "ChatRWKV", "RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth"),
File "D:\PythonProject\JittorLLMs\models\chatrwkv__init.py", line 125, in init
self.model = RWKV_RNN(args)
File "D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py", line 29, in init__
w = jt.load(args.MODEL_PATH)
File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init__.py", line 1128, in load
model_dict = safeunpickle(path)
File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init.py", line 98, in safeunpickle
model_dict = load_pytorch(path)
File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 271, in load_pytorch
result = dfs_results(result)
File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init.py", line 144, in exit__
setattr(flags, k, v)
RuntimeError: [f 0616 17:34:27.567000 36 executor.cc:683]
Execute fused operator(3/724) failed.
[OP TYPE]: fused_op:( reindex,) [Input]: float32[128709120,], int32[2,],
[Reason]: [f 0616 17:34:27.566000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)