comfyanonymous / ComfyUI_TensorRT

MIT License
323 stars 15 forks source link

TypeError: a bytes-like object is required, not 'NoneType #43

Open Lolagatorade opened 6 days ago

Lolagatorade commented 6 days ago

errors out after some time on the 1.5% progress

Arch linux (manjaro) , wayland, SD3 and SDXLturbo. Using SwarmUI front end

image image

geroldmeisinger commented 4 days ago

I get the same error when my GPU goes OOM (which was usually already prophesied in the log further up)

Lolagatorade commented 3 days ago

I get the same error when my GPU goes OOM (which was usually already prophesied in the log further up)

Any ideas on the error?

geroldmeisinger commented 2 days ago
Lolagatorade commented 2 days ago
  • download pre converted model

  • shutdown your desktop managers to squeeze every last VRAM into conversion. maybe it's just the 200MB that's missing.

  • buy bigger GPU

I can give it a try, but I really doubt it's an issue on Nvidia side of things. After all, it's supported on the 10 series cards so I think it's just the way it's implemented on comfyui. It's a 6 GB GTX 1060... i've ran inference on multiple models, both LLM and stable diffusion.

Thanks for the advice though

geroldmeisinger commented 2 days ago

I am able to successfully generate engines with low batch size. If it is too high I get this:

got prompt
model_type FLOW
Using xformers attention in VAE
Using xformers attention in VAE
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
Requested to load SD3
Loading 1 new model
~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:852: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert h <= self.pos_embed_max_size, (h, self.pos_embed_max_size)
~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:853: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert w <= self.pos_embed_max_size, (w, self.pos_embed_max_size)
~/ComfyUI/comfy/ldm/modules/attention.py:353: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if b * heads > 65535:
~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:884: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert h * w == x.shape[1]
[06/27/2024-07:50:42] [TRT] [I] The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.
[06/27/2024-07:50:42] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 4738, GPU 936 (MiB)
[06/27/2024-07:50:45] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1759, GPU +314, now: CPU 6633, GPU 1250 (MiB)
[06/27/2024-07:50:45] [TRT] [I] ----------------------------------------------------------------
[06/27/2024-07:50:45] [TRT] [I] Input filename:   ~/ComfyUI/temp/1719467426.3114874/model.onnx
[06/27/2024-07:50:45] [TRT] [I] ONNX IR version:  0.0.8
[06/27/2024-07:50:45] [TRT] [I] Opset version:    17
[06/27/2024-07:50:45] [TRT] [I] Producer name:    pytorch
[06/27/2024-07:50:45] [TRT] [I] Producer version: 2.3.0
[06/27/2024-07:50:45] [TRT] [I] Domain:           
[06/27/2024-07:50:45] [TRT] [I] Model version:    0
[06/27/2024-07:50:45] [TRT] [I] Doc string:       
[06/27/2024-07:50:45] [TRT] [I] ----------------------------------------------------------------
Read 18929194 bytes from timing cache.
Building engine:  50%|███████████████████████████████████████████████████████████████████████████████▌                                                                               | 3/6 [00:00<00:00, 16.45it/s[06/27/2024-07:50:46] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.                                                                            | 0/1 [00:00<?, ?it/s]
                                                                                                                                                                                                                  autotuner.cpp:929: CHECK_EQ(status, myelinSuccess) failed.
  LHS: 20 profile costs:   0%|                                                                                                                                                               | 0/5 [00:00<?, ?it/s]
  RHS: 0actics:   0%|                                                                                                                                                                        | 0/3 [00:00<?, ?it/s]
Deserialization during autotuning failed. Error: [success]                                                                                                                                   | 0/8 [00:00<?, ?it/s]
[06/27/2024-07:50:56] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [autotuner.cpp:operator():937] [impl.cpp:default_alloc:275] CUDA error 2 for 2700792064-byte allocation.
[06/27/2024-07:50:57] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[06/27/2024-07:50:57] [TRT] [W] Requested amount of GPU memory (1357450240 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[06/27/2024-07:50:58] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:116] autotuning: User allocator error allocating 1357450240-byte buffer
[06/27/2024-07:50:59] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[06/27/2024-07:50:59] [TRT] [W] Requested amount of GPU memory (1357450240 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[06/27/2024-07:51:00] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:116] autotuning: User allocator error allocating 1357450240-byte buffer
Building engine:  50%|███████████████████████████████████████████████████████████████████████████████▌                                                                               | 3/6 [00:13<00:13,  4.51s/it[06/27/2024-07:51:00] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/unet/joint_blocks.23/Slice_8_output_0[Constant].../unet/Slice_3]}.                               | 0/1 [00:13<?, ?it/s]
Building engine: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00,  2.26s/it]
[06/27/2024-07:51:00] [TRT] [E] 10: [optimizer.cpp::computeCosts::4105] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/unet/joint_blocks.23/Slice_8_output_0[Constant].../unet/Slice_3]}.)                                                                                                                                                                                                   
!!! Exception during processing!!! a bytes-like object is required, not 'NoneType'
Traceback (most recent call last):
  File "~/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/ComfyUI/custom_nodes/ComfyUI_TensorRT/tensorrt_convert.py", line 605, in convert
    return super()._convert(
           ^^^^^^^^^^^^^^^^^
  File "~/ComfyUI/custom_nodes/ComfyUI_TensorRT/tensorrt_convert.py", line 362, in _convert
    f.write(serialized_engine)
TypeError: a bytes-like object is required, not 'NoneType'

Prompt executed in 49.95 seconds

and I think the important part is:

[06/27/2024-07:50:56] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [autotuner.cpp:operator():937] [impl.cpp:default_alloc:275] CUDA error 2 for 2700792064-byte allocation.
[06/27/2024-07:50:57] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)

and TypeError: a bytes-like object is required, not 'NoneType' may be misleading as the node should have exited much earlier.

I don't know if it looks the same on your side.

geroldmeisinger commented 2 days ago

@Lolagatorade can you please rename the title of this issue to "TypeError: a bytes-like object is required, not 'NoneType'"

Lolagatorade commented 2 days ago

I am able to successfully generate engines with low batch size. If it is too high I get this:


got prompt

model_type FLOW

Using xformers attention in VAE

Using xformers attention in VAE

no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.

Requested to load SD3

Loading 1 new model

~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:852: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

  assert h <= self.pos_embed_max_size, (h, self.pos_embed_max_size)

~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:853: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

  assert w <= self.pos_embed_max_size, (w, self.pos_embed_max_size)

~/ComfyUI/comfy/ldm/modules/attention.py:353: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

  if b * heads > 65535:

~/ComfyUI/comfy/ldm/modules/diffusionmodules/mmdit.py:884: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

  assert h * w == x.shape[1]

[06/27/2024-07:50:42] [TRT] [I] The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value.

[06/27/2024-07:50:42] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 4738, GPU 936 (MiB)

[06/27/2024-07:50:45] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1759, GPU +314, now: CPU 6633, GPU 1250 (MiB)

[06/27/2024-07:50:45] [TRT] [I] ----------------------------------------------------------------

[06/27/2024-07:50:45] [TRT] [I] Input filename:   ~/ComfyUI/temp/1719467426.3114874/model.onnx

[06/27/2024-07:50:45] [TRT] [I] ONNX IR version:  0.0.8

[06/27/2024-07:50:45] [TRT] [I] Opset version:    17

[06/27/2024-07:50:45] [TRT] [I] Producer name:    pytorch

[06/27/2024-07:50:45] [TRT] [I] Producer version: 2.3.0

[06/27/2024-07:50:45] [TRT] [I] Domain:           

[06/27/2024-07:50:45] [TRT] [I] Model version:    0

[06/27/2024-07:50:45] [TRT] [I] Doc string:       

[06/27/2024-07:50:45] [TRT] [I] ----------------------------------------------------------------

Read 18929194 bytes from timing cache.

Building engine:  50%|███████████████████████████████████████████████████████████████████████████████▌                                                                               | 3/6 [00:00<00:00, 16.45it/s[06/27/2024-07:50:46] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.                                                                            | 0/1 [00:00<?, ?it/s]

                                                                                                                                                                                                                  autotuner.cpp:929: CHECK_EQ(status, myelinSuccess) failed.

  LHS: 20 profile costs:   0%|                                                                                                                                                               | 0/5 [00:00<?, ?it/s]

  RHS: 0actics:   0%|                                                                                                                                                                        | 0/3 [00:00<?, ?it/s]

Deserialization during autotuning failed. Error: [success]                                                                                                                                   | 0/8 [00:00<?, ?it/s]

[06/27/2024-07:50:56] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [autotuner.cpp:operator():937] [impl.cpp:default_alloc:275] CUDA error 2 for 2700792064-byte allocation.

[06/27/2024-07:50:57] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)

[06/27/2024-07:50:57] [TRT] [W] Requested amount of GPU memory (1357450240 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.

[06/27/2024-07:50:58] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:116] autotuning: User allocator error allocating 1357450240-byte buffer

[06/27/2024-07:50:59] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)

[06/27/2024-07:50:59] [TRT] [W] Requested amount of GPU memory (1357450240 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.

[06/27/2024-07:51:00] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:116] autotuning: User allocator error allocating 1357450240-byte buffer

Building engine:  50%|███████████████████████████████████████████████████████████████████████████████▌                                                                               | 3/6 [00:13<00:13,  4.51s/it[06/27/2024-07:51:00] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/unet/joint_blocks.23/Slice_8_output_0[Constant].../unet/Slice_3]}.                               | 0/1 [00:13<?, ?it/s]

Building engine: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00,  2.26s/it]

[06/27/2024-07:51:00] [TRT] [E] 10: [optimizer.cpp::computeCosts::4105] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/unet/joint_blocks.23/Slice_8_output_0[Constant].../unet/Slice_3]}.)                                                                                                                                                                                                   

!!! Exception during processing!!! a bytes-like object is required, not 'NoneType'

Traceback (most recent call last):

  File "~/ComfyUI/execution.py", line 151, in recursive_execute

    output_data, output_ui = get_output_data(obj, input_data_all)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "~/ComfyUI/execution.py", line 81, in get_output_data

    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)

                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "~/ComfyUI/execution.py", line 74, in map_node_over_list

    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "~/ComfyUI/custom_nodes/ComfyUI_TensorRT/tensorrt_convert.py", line 605, in convert

    return super()._convert(

           ^^^^^^^^^^^^^^^^^

  File "~/ComfyUI/custom_nodes/ComfyUI_TensorRT/tensorrt_convert.py", line 362, in _convert

    f.write(serialized_engine)

TypeError: a bytes-like object is required, not 'NoneType'

Prompt executed in 49.95 seconds

and I think the important part is:


[06/27/2024-07:50:56] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [autotuner.cpp:operator():937] [impl.cpp:default_alloc:275] CUDA error 2 for 2700792064-byte allocation.

[06/27/2024-07:50:57] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)

and TypeError: a bytes-like object is required, not 'NoneType' may be misleading as the node should have exited much earlier.

I don't know if it looks the same on your side.

For me, it does not look like that the error log isn't as detailed. It does not even start the actual process. The only data available was from what I posted on the screenshot. Take note that I am not using the vanilla comfy UI. I am using stableswarm with the comfy UI backend server. I was recommended by the developer to come to this. GitHub page to discuss my issue.

But the core error does seem to be similar for the non-type issue as that is what shows up on the red section of the log