GlassyWing / text-detection-ocr

Chinese text detection and recognition based on CTPN + DENSENET using Keras and Tensor Flow,使用keras和tensorflow基于CTPN+Densenet实现的中文文本检测和识别
Apache License 2.0
285 stars 116 forks source link

运行时GPU内存耗尽 #15

Open leolle opened 5 years ago

leolle commented 5 years ago

CPU能够识别,想试一下GPU版本的tensorflow识别速度上是否有提升,但是GPU内存比较少,结果悲剧。

ResourceExhaustedError: OOM when allocating tensor with shape[1,64,710,896] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: block1_conv2/convolution = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](block1_conv1/Relu, block1_conv2/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: rpn_class/Reshape/_1549 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_633_rpn_class/Reshape", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

想问一下@GlassyWing, GPU最少需要多少内存,是否有办法调整?

GlassyWing commented 5 years ago

由于图像大小不受限,OOM实际上是不可避免的,你可以通过压缩图像来减少此类问题,如下,但是注意这会影响识别精度。最根本的原因是CTPN占用内存过大,若有更好的模型,以后会进行替换

import math

import numpy as np
from PIL import Image

import dlocr

def resize(img):
    """压缩图像"""
    w, h = img.size
    m = 860  # 图像最大大小

    if w > m:
        scale = m / w
    elif h > m:
        scale = m / h
    else:
        scale = 1

    mw = int(math.floor(w * scale))
    mh = int(math.floor(h * scale))

    new_image = img.resize((mw, mh))
    return new_image

def convert_to_white_if_need(img):
    """转换透明通道"""
    if img.mode == 'RGBA':
        img_arr = np.array(img)
        img_arr[img_arr[..., -1] == 0] = 255
        img_arr = img_arr[..., :3]
        return np.uint8(img_arr)

    return np.uint8(img.convert("RGB"))

if __name__ == '__main__':
    img = Image.open("../data/PDF_Document.png")
    img = convert_to_white_if_need(resize(img))

    ocr = dlocr.get_or_create()
    bboxes, texts = ocr.detect(img)
    print('\n'.join(texts))