Jittor / JittorLLMs

计图大模型推理库,具有高性能、配置要求低、中文支持好、可移植等特点
Apache License 2.0
2.37k stars 183 forks source link

一顿命令猛如虎,执行完之后哐哐一顿报错。走到 loading_model就死了 #144

Open banls opened 1 year ago

banls commented 1 year ago

Loading model - C:\Users\Administrator.cache\jittor\jt1.3.8\cl\py3.9.16\Windows-10-10.x9e\IntelRCoreTMi5x4f\default\cu11.2.67\checkpoints\ChatRWKV\RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth [e 0616 17:34:27.525000 36 mem_info.cc:101] appear time -> node cnt: {1:1648, } [i 0616 17:34:27.527000 36 cuda_flags.cc:39] CUDA enabled. [e 0616 17:34:27.567000 36 mem_info.cc:101] appear time -> node cnt: {2:1642, } Traceback (most recent call last): File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 270, in load_pytorch result = unpickler.load() File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 44, in persistent_load load_tensor(contents, dtype, nbytes, key, _maybe_decode_ascii(location)) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 21, in load_tensor loaded_storages[key] = contents.read_var(name, dtype) RuntimeError: [f 0616 17:34:27.526000 36 executor.cc:683] Execute fused operator(0/1) failed.[OP TYPE]: empty [Input]:

 D:\PythonProject\JittorLLMs\api.py:47 <<module>>
 D:\PythonProject\JittorLLMs\models\__init__.py:46 <get_model>
 D:\PythonProject\JittorLLMs\models\chatrwkv\__init__.py:335 <get_model>
 D:\PythonProject\JittorLLMs\models\chatrwkv\__init__.py:125 <__init__>
 D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <__init__>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor\__init__.py:1128 <load>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor\__init__.py:98 <safeunpickle>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:44 <persistent_load>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:21 <load_tensor>

[Reason]: [f 0616 17:34:27.523000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\PythonProject\JittorLLMs\api.py", line 47, in model = models.get_model(args) File "D:\PythonProject\JittorLLMs\models__init.py", line 46, in get_model return module.get_model(args) File "D:\PythonProject\JittorLLMs\models\chatrwkv__init.py", line 335, in get_model return ChatRWKVModel(os.path.join(jt.compiler.ck_path, "ChatRWKV", "RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth"), File "D:\PythonProject\JittorLLMs\models\chatrwkv__init.py", line 125, in init self.model = RWKV_RNN(args) File "D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py", line 29, in init__ w = jt.load(args.MODEL_PATH) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init__.py", line 1128, in load model_dict = safeunpickle(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init.py", line 98, in safeunpickle model_dict = load_pytorch(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 271, in load_pytorch result = dfs_results(result) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor\init.py", line 144, in exit__ setattr(flags, k, v) RuntimeError: [f 0616 17:34:27.567000 36 executor.cc:683] Execute fused operator(3/724) failed.

[OP TYPE]: fused_op:( reindex,) [Input]: float32[128709120,], int32[2,],

 D:\PythonProject\JittorLLMs\api.py:47 <<module>>
 D:\PythonProject\JittorLLMs\models\__init__.py:46 <get_model>
 D:\PythonProject\JittorLLMs\models\chatrwkv\__init__.py:335 <get_model>
 D:\PythonProject\JittorLLMs\models\chatrwkv\__init__.py:125 <__init__>
 D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <__init__>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor\__init__.py:1128 <load>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor\__init__.py:98 <safeunpickle>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch>
 C:\Users\Administrator\.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:87 <jittor_rebuild>

[Reason]: [f 0616 17:34:27.566000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)

Jittor commented 1 year ago

cudaErrorMemoryAllocation

内存或者显存不够了

---原始邮件--- 发件人: @.> 发送时间: 2023年6月16日(周五) 下午5:36 收件人: @.>; 抄送: @.***>; 主题: [Jittor/JittorLLMs] 一顿命令猛如虎,执行完之后哐哐一顿报错。走到 loading_model就死了 (Issue #144)

Loading model - C:\Users\Administrator.cache\jittor\jt1.3.8\cl\py3.9.16\Windows-10-10.x9e\IntelRCoreTMi5x4f\default\cu11.2.67\checkpoints\ChatRWKV\RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth [e 0616 17:34:27.525000 36 mem_info.cc:101] appear time -> node cnt: {1:1648, } [i 0616 17:34:27.527000 36 cuda_flags.cc:39] CUDA enabled. [e 0616 17:34:27.567000 36 mem_info.cc:101] appear time -> node cnt: {2:1642, } Traceback (most recent call last): File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 270, in load_pytorch result = unpickler.load() File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 44, in persistent_load load_tensor(contents, dtype, nbytes, key, maybe_decode_ascii(location)) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 21, in load_tensor loaded_storages[key] = contents.read_var(name, dtype) RuntimeError: [f 0616 17:34:27.526000 36 executor.cc:683] Execute fused operator(0/1) failed.[OP TYPE]: empty [Input]:

D:\PythonProject\JittorLLMs\api.py:47 <> D:\PythonProject\JittorLLMs\models_init.py:46 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:335 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:125 <init> D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <init> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:1128 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:98 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:44 <persistent_load> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:21 <load_tensor> [Reason]: [f 0616 17:34:27.523000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\PythonProject\JittorLLMs\api.py", line 47, in model = models.get_model(args) File "D:\PythonProject\JittorLLMs\modelsinit.py", line 46, in get_model return module.get_model(args) File "D:\PythonProject\JittorLLMs\models\chatrwkvinit.py", line 335, in get_model return ChatRWKVModel(os.path.join(jt.compiler.ck_path, "ChatRWKV", "RWKV-4-Pile-3B-EngChn-test4-20230115-fp32.pth"), File "D:\PythonProject\JittorLLMs\models\chatrwkvinit.py", line 125, in init self.model = RWKV_RNN(args) File "D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py", line 29, in init w = jt.load(args.MODEL_PATH) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 1128, in load model_dict = safeunpickle(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 98, in safeunpickle model_dict = load_pytorch(path) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py", line 271, in load_pytorch result = dfs_results(result) File "C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py", line 144, in exit setattr(flags, k, v) RuntimeError: [f 0616 17:34:27.567000 36 executor.cc:683] Execute fused operator(3/724) failed.

[OP TYPE]: fused_op:( reindex,) [Input]: float32[128709120,], int32[2,],

D:\PythonProject\JittorLLMs\api.py:47 <> D:\PythonProject\JittorLLMs\models_init.py:46 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:335 <get_model> D:\PythonProject\JittorLLMs\models\chatrwkvinit.py:125 <init> D:\PythonProject\JittorLLMs\models\chatrwkv\src\model_run.py:29 <init> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:1128 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittorinit.py:98 C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:270 <load_pytorch> C:\Users\Administrator.conda\envs\jittorllms\lib\site-packages\jittor_utils\load_pytorch.py:87 <jittor_rebuild> [Reason]: [f 0616 17:34:27.566000 36 helper_cuda.h:128] CUDA error at c:\users\administrator.conda\envs\jittorllms\lib\site-packages\jittor\src\mem\allocator\cuda_host_allocator.cc:22 code=2( cudaErrorMemoryAllocation ) cudaMallocHost(&ptr, size)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>