chaiNNer-org / chaiNNer

A node-based image processing GUI aimed at making chaining image processing tasks easy and customizable. Born as an AI upscaling application, chaiNNer has grown into an extremely flexible and powerful programmatic image processing application.
https://chaiNNer.app
GNU General Public License v3.0
4.44k stars 278 forks source link

TensorRT: Infinity loop when load SwinIR onnx with TensorRT #961

Open 0x4E69676874466F78 opened 1 year ago

0x4E69676874466F78 commented 1 year ago

I try load 003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN Simplified.onnx from https://github.com/joeyballentine/chaiNNer/issues/960#issuecomment-1248279853 chaiNNer silently loads model for a long time, but judging by the log, it ran into problems:

[2022-09-15 18:52:32.404] [error] Backend: 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 2 . 4 0 6 0 6 9 4   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 2   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 6 9 :   Y o u r   O N N X   m o d e l   h a s   b e e n   g e n e r a t e d   w 
[2022-09-15 18:52:32.405] [error] Backend: i t h   I N T 6 4   w e i g h t s ,   w h i l e   T e n s o r R T   d o e s   n o t   n a t i v e l y   s u p p o r t   I N T 6 4 .   A t t e m p t i n g   t o   c a s t   d o w n   t o   I N T 3 2 . 
 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 2 . 4 0 6 4 7 0 7   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 2   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 
 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 2 . 4 0 6 7 9 4 7   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 2   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 

[2022-09-15 18:52:34.058] [error] Backend: 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 4 . 0 5 9 4 8 8 0   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 4   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t 
[2022-09-15 18:52:34.059] [error] Backend: _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 
 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 4 . 0 5 9 9 9 3 0   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 4   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 

[2022-09-15 18:52:34.293] [error] Backend: 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 4 . 2 9 5 0 6 7 1   [ E : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 5 8   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 4       E R R O R ]   [ s h u f 
[2022-09-15 18:52:34.294] [error] Backend: f l e N o d e . c p p : : n v i n f e r 1 : : b u i l d e r : : S h u f f l e N o d e : : s y m b o l i c E x e c u t e : : 3 9 2 ]   E r r o r   C o d e   4 :   I n t e r n a l   E r r o r   ( R e s h a p e _ 4 2 :   I S h u f f l e L a y e r   a p p l i e d   t o   s h a p e   t e n s o r   m u s t   h a v e   0   o r   1   r e s h a p e   d i m e n s i o n s :   d i m e n s i o n s   w e r e   [ - 1 , 2 ] ) 

[2022-09-15 18:52:34.355] [error] Backend: 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 4 . 3 5 6 9 0 7 4   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 9 - 1 5   1 5 : 5 2 : 3 4   W A R N I N G ]   
[2022-09-15 18:52:34.356] [error] Backend: e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 

[2022-09-15 18:52:35.869] [error] Backend: 2 0 2 2 - 0 9 - 1 5   1 8 : 5 2 : 3 5 . 8 7 1 0 4 5 8   [ W : o n n x r u n t i m e : D e f a u l t ,   t e n s o r r t _ e x e c u t i o n _ p r o v i d e r . h : 6 0   o n n x r u n t i m e : : T e n s o r r t L o g g e r : : l o g ]   [ 2 0 2 2 - 0 
[2022-09-15 18:52:35.870] [error] Backend: 9 - 1 5   1 5 : 5 2 : 3 5   W A R N I N G ]   e x t e r n a l \ o n n x - t e n s o r r t \ o n n x 2 t r t _ u t i l s . c p p : 3 9 5 :   O n e   o r   m o r e   w e i g h t s   o u t s i d e   t h e   r a n g e   o f   I N T 3 2   w a s   c l a m p e d 

I've been waiting for over 20 minutes. Python consumes 8% CPU.

0x4E69676874466F78 commented 1 year ago

It looks like TensorRT writes to the log in UTF-16 (so there are extra "spaces").

0x4E69676874466F78 commented 1 year ago
[2022-09-15 19:45:35.068] [info]  Backend: [27828] [INFO] Reading state dict from path: A:\Downloads\browser\003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth

[2022-09-15 19:45:35.225] [info]  Backend: [27828] [INFO] Loading state dict into pytorch model arch

[2022-09-15 19:45:35.227] [error] Backend: C:\Users\fox\AppData\Roaming\chaiNNer\python\python\lib\site-packages\torch\functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ..\aten\src\ATen\native\TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
joeyballentine commented 1 year ago

it's possible that TensorRT just can't support SwinIR models with pixelshuffle. the error (which is amazingly difficult to read lol) is this: E r r o r C o d e 4 : I n t e r n a l E r r o r ( R e s h a p e _ 4 2 : I S h u f f l e L a y e r a p p l i e d t o s h a p e t e n s o r m u s t h a v e 0 o r 1 r e s h a p e d i m e n s i o n s : d i m e n s i o n s w e r e [ - 1 , 2 ] )

0x4E69676874466F78 commented 1 year ago

@joeyballentine unpleasant yes, you need to force it to output in UTF-8.

joeyballentine commented 1 year ago

Does the official RealESRGANx2 model work with tensorrt? if it doesn't, then the problem is definitely pixelunshuffle

joeyballentine commented 1 year ago

though now that I think about it, I think swinir uses pixelshuffle without pixelunshuffle, so it would have to be pixelshuffle itself that's broken. let me check

0x4E69676874466F78 commented 1 year ago

@joeyballentine RealESRGAN_x2plus.pth?

An error occurred in a PyTorch Upscale Image node:

pixel_unshuffle expects height to be divisible by downscale_factor, but input.size(-2)=337 is not divisible by 2

Input values (partial):
• Image: RGB image 543x337
• Number of Tiles: Auto

(It doesn't even work with pytorch)

joeyballentine commented 1 year ago

Seems like the solution is running this tensorrt tool on the engine file (which idk if you can do yourself at the moment without it caching the .engine file)