Using rembg with GPU consumes more time compared to using CPU.

q751654992 commented 4 months ago


import onnxruntime as ort
import tensorrt
import torch
from datetime import datetime

print('onnxruntime:', ort.get_device())
print('onnxruntime_version:', ort.__version__)
print('tensorrt:', tensorrt.__version__)
print('CUDA:',torch.version.cuda)
print('Pytorch:',torch.__version__)
print('cuda is_available:','available' if(torch.cuda.is_available()) else 'unavailable')
print('device_count:',torch.cuda.device_count())
print('device_name:',torch.cuda.get_device_name())

from rembg import remove
input_path = 'J:/test.jpg'
output_path = 'J:/output.jpg'

now = datetime.now()
with open(input_path, 'rb') as i:
    with open(output_path, 'wb') as o:
        input = i.read()
        output = remove(input)
        o.write(output)
        print('cost', datetime.now() - now)

then print

onnxruntime: GPU
tensorrt: 8.6.1     
CUDA: 12.1
Pytorch: 2.1.1+cu121
cuda is_available: available
device_count: 1        
device_name: NVIDIA GeForce GTX 1050 Ti
2024-02-29 01:07:56.8485491 [W:onnxruntime:Default, tensorrt_execution_provider.h:83 onnxruntime::TensorrtLogger::log] [2024-02-28 17:07:56 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2024-02-29 01:07:57.1196997 [W:onnxruntime:Default, tensorrt_execution_provider.h:83 onnxruntime::TensorrtLogger::log] [2024-02-28 17:07:57 WARNING] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
cost 0:01:51.046107

next
pip uninstall onnxruntime-gpu pip install onnxruntime

then print

onnxruntime: CPU
onnxruntime_version: 1.17.1
tensorrt: 8.6.1
CUDA: 12.1
Pytorch: 2.1.1+cu121
cuda is_available: available
device_count: 1
device_name: NVIDIA GeForce GTX 1050 Ti
cost 0:00:01.712570

I don't understand, Please tell me what is the problem

SyraTi commented 4 months ago

I have the same issue, and I think it's because the model under .u2net folder uses Int64 weight, whereas TensorRT doesn't natively support INT64, leading to significant time spent casting Int64 to Int32. Thats why GPU consumes more time than CPU. I believe the solution to the problem is to replace int32 model in .u2net folder and set MODEL_CHECKSUM_DISABLED=TRUE as mentioned in issue #496 I'm wasting time converting the int64 onnx model to int32 and have no idea, since im new to this. hope someone can help.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 30 days with no activity.

q751654992 commented 3 months ago

Has anyone solved this problem?

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

danielgatis / rembg

Using rembg with GPU consumes more time compared to using CPU. #599