crynux-ai / crynux-node

Crynux Node for the Helium(He) Network
Apache License 2.0
96 stars 13 forks source link

crynux_server.node_manager.node_manager: Node manager init error: inference models failed #239

Closed zioju closed 1 month ago

zioju commented 1 month ago

Problem Description

Error starting the Node in Docker

The Node was working fine ... after a system crash I haven't been able to run the Node anymore. I tried to rebuild the Container but same error (see below).


Error Log

2024-05-27 19:21:27 [2024-05-27 17:21:27] [ERROR ] main: Task execution error 2024-05-27 19:21:27 2024-05-27 19:21:27 Traceback (most recent call last): 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/errors.py", line 93, in _wrap_cuda_execution_error 2024-05-27 19:21:27 2024-05-27 19:21:27 yield 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/inference_task.py", line 240, in run_task 2024-05-27 19:21:27 2024-05-27 19:21:27 call_args = get_pipeline_call_args( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/inference_task.py", line 168, in get_pipeline_call_args 2024-05-27 19:21:27 2024-05-27 19:21:27 add_prompt_pipeline_call_args(call_args, pipeline, prompt, negative_prompt) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/prompt.py", line 10, in add_prompt_pipeline_call_args 2024-05-27 19:21:27 2024-05-27 19:21:27 add_prompt_pipeline_sd15_call_args(call_args, pipeline, prompt, negative_prompt) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/prompt.py", line 57, in add_prompt_pipeline_sd15_call_args 2024-05-27 19:21:27 2024-05-27 19:21:27 conditioning = compel([prompt, negative_prompt]) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-27 19:21:27 2024-05-27 19:21:27 return func(*args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/compel.py", line 135, in call 2024-05-27 19:21:27 2024-05-27 19:21:27 output = self.build_conditioning_tensor(text_input) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/compel.py", line 112, in build_conditioningtensor 2024-05-27 19:21:27 2024-05-27 19:21:27 conditioning, = self.build_conditioning_tensor_for_conjunction(conjunction) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/compel.py", line 186, in build_conditioning_tensor_for_conjunction 2024-05-27 19:21:27 2024-05-27 19:21:27 this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/compel.py", line 218, in build_conditioning_tensor_for_prompt_object 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._get_conditioning_for_flattened_prompt(prompt), {} 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/compel.py", line 282, in _get_conditioning_for_flattened_prompt 2024-05-27 19:21:27 2024-05-27 19:21:27 return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/embeddings_provider.py", line 120, in get_embeddings_for_weighted_prompt_fragments 2024-05-27 19:21:27 2024-05-27 19:21:27 base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights, mask, device=device) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/embeddings_provider.py", line 357, in build_weighted_embedding_tensor 2024-05-27 19:21:27 2024-05-27 19:21:27 empty_z = self._encode_token_ids_to_embeddings(empty_token_ids) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/compel/embeddings_provider.py", line 390, in _encode_token_ids_to_embeddings 2024-05-27 19:21:27 2024-05-27 19:21:27 text_encoder_output = self.text_encoder(token_ids, 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._call_impl(*args, *kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return forward_call(args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 800, in forward 2024-05-27 19:21:27 2024-05-27 19:21:27 return self.text_model( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._call_impl(*args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return forward_call(*args, *kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 705, in forward 2024-05-27 19:21:27 2024-05-27 19:21:27 encoder_outputs = self.encoder( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._call_impl(args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return forward_call(*args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 632, in forward 2024-05-27 19:21:27 2024-05-27 19:21:27 layer_outputs = encoder_layer( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._call_impl(*args, *kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return forward_call(args, kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward 2024-05-27 19:21:27 2024-05-27 19:21:27 hidden_states = self.layer_norm1(hidden_states) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return self._call_impl(*args, *kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl 2024-05-27 19:21:27 2024-05-27 19:21:27 return forward_call(args, *kwargs) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 196, in forward 2024-05-27 19:21:27 2024-05-27 19:21:27 return F.layer_norm( 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm 2024-05-27 19:21:27 2024-05-27 19:21:27 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) 2024-05-27 19:21:27 2024-05-27 19:21:27 RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' 2024-05-27 19:21:27 2024-05-27 19:21:27 2024-05-27 19:21:27 2024-05-27 19:21:27 The above exception was the direct cause of the following exception: 2024-05-27 19:21:27 2024-05-27 19:21:27 2024-05-27 19:21:27 2024-05-27 19:21:27 Traceback (most recent call last): 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/crynux_worker_process.py", line 92, in main 2024-05-27 19:21:27 2024-05-27 19:21:27 _inference(args) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/crynux_worker_process.py", line 56, in _inference 2024-05-27 19:21:27 2024-05-27 19:21:27 sd_inference(output_dir, task_args_str) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/crynux_worker_process.py", line 29, in sd_inference 2024-05-27 19:21:27 2024-05-27 19:21:27 imgs = sd_run_task(task_args) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/inference_task.py", line 239, in run_task 2024-05-27 19:21:27 2024-05-27 19:21:27 with wrap_execution_error(): 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/usr/lib/python3.10/contextlib.py", line 153, in exit 2024-05-27 19:21:27 2024-05-27 19:21:27 self.gen.throw(typ, value, traceback) 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/errors.py", line 115, in wrap_execution_error 2024-05-27 19:21:27 2024-05-27 19:21:27 yield from _wrap_cuda_execution_error() 2024-05-27 19:21:27 2024-05-27 19:21:27 File "/app/worker/venv/lib/python3.10/site-packages/sd_task/inference_task_runner/errors.py", line 100, in _wrap_cuda_execution_error 2024-05-27 19:21:27 2024-05-27 19:21:27 raise TaskExecutionError from e 2024-05-27 19:21:27 2024-05-27 19:21:27 sd_task.inference_task_runner.errors.TaskExecutionError: Task execution error 2024-05-27 19:21:27 2024-05-27 19:21:27 [2024-05-27 17:21:27] [ERROR ] main: crynux worker process error 2024-05-27 19:21:27 2024-05-27 19:21:27 [2024-05-27 17:21:27] [SDTask] The pipeline has been successfully loaded 2024-05-27 19:21:27 2024-05-27 19:21:28 [2024-05-27 17:21:28] [ERROR ] crynux_worker.utils: crynux worker error 2024-05-27 19:21:28 [2024-05-27 17:21:28] [ERROR ] crynux_worker.inference: inference models failed 2024-05-27 19:21:28 Traceback (most recent call last): 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_worker/inference.py", line 68, in inference 2024-05-27 19:21:28 call_inference_script( 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_worker/inference.py", line 42, in call_inference_script 2024-05-27 19:21:28 raise ValueError("inference models failed") 2024-05-27 19:21:28 ValueError: inference models failed 2024-05-27 19:21:28 [2024-05-27 17:21:28] [ERROR ] crynux_worker.inference: Inference error 2024-05-27 19:21:28 [2024-05-27 17:21:28] [ERROR ] crynux_server.node_manager.node_manager: inference models failed 2024-05-27 19:21:28 Traceback (most recent call last): 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_server/node_manager/node_manager.py", line 506, in _run 2024-05-27 19:21:28 async with create_task_group() as init_tg: 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in aexit 2024-05-27 19:21:28 raise exceptions[0] 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_server/node_manager/node_manager.py", line 345, in _init 2024-05-27 19:21:28 await to_thread.run_sync( 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync 2024-05-27 19:21:28 return await get_asynclib().run_sync_in_worker_thread( 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread 2024-05-27 19:21:28 return await future 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run 2024-05-27 19:21:28 result = context.run(func, args) 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_worker/inference.py", line 68, in inference 2024-05-27 19:21:28 call_inference_script( 2024-05-27 19:21:28 File "/app/venv/lib/python3.10/site-packages/crynux_worker/inference.py", line 42, in call_inference_script 2024-05-27 19:21:28 raise ValueError("inference models failed") 2024-05-27 19:21:28 ValueError: inference models failed 2024-05-27 19:21:28 [2024-05-27 17:21:28] [ERROR ] crynux_server.node_manager.node_manager: Node manager init error: inference models failed


Device Position

At home


Operating System

Docker container started on Local PC


How do you Start the Crynux Node

Docker Compose project


Error message on the WebUI

Node error: Node manager init error: inference models failed. Please restart the Node.

zioju commented 1 month ago

The issue get solved restoring the old NVidia drivers v552.44 (removing the new v555.85 drivers)