argman / EAST

A tensorflow implementation of EAST text detector
GNU General Public License v3.0
3.02k stars 1.05k forks source link

Training takes too much time #115

Open saifhassan opened 6 years ago

saifhassan commented 6 years ago

Training time takes too much time. ICDAR 2015 containing 1000 images and I tried on following:

Machine 1: Corei7 with 16 GB RAM on CPU Machine 2 Corei7 with 16 GB RAM on GTX 960 Graphics Card

So what should I do for quick training. Please guide.

Thanks in advance

zxytim commented 6 years ago

Please definitely make use of GPUs.

On Sat, Mar 10, 2018 at 10:54 PM Enigma notifications@github.com wrote:

Training time takes too much time. ICDAR 2015 containing 1000 images and I tried on following:

Machine 1: Corei7 with 16 GB RAM on CPU Machine 2 Corei7 with 16 GB RAM on GTX 960 Graphics Card

So what should I do for quick training. Please guide.

Thanks in advance

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/argman/EAST/issues/115, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2QT3JP3s4C4H4UsTbQYl3VPtss5Na7ks5tc-lAgaJpZM4SlTdM .

saifhassan commented 6 years ago

@zxytim I have tried using GPUs on HP Laptop having GTX Graphics card of 960 but still same

saifhassan commented 6 years ago

Except GPU, is there any possibility that I can do because it consumes almost an hour to complete 10-20 steps.

LouisRuan commented 6 years ago

I think the function rbox use too many time. https://github.com/argman/EAST/blob/63bd302465cfeace305f02e39c5bc73ec7d0bbea/icdar.py#L568 Repeat execution the function: https://github.com/argman/EAST/blob/63bd302465cfeace305f02e39c5bc73ec7d0bbea/icdar.py#L246 Contain numpy method,only run on cpu,you can pretreatment your data before training.

Yuanhang8605 commented 6 years ago

use cython to rewrite this part,I have 20× speed after rewrite this part for y, x in xy_in_poly. You can contact me to get the code

argman commented 6 years ago

@LouisRuan yes, you can rewrite that part with numpy matrix operation @Yuanhang8605 good job, pull requests welcome!

Yuanhang8605 commented 6 years ago

I have create a repo for you "https://github.com/Yuanhang8605/geo_map_gen-for-argman-east.git", you can download the cython code and check whether it's right or not. just the "for y, x in xy_in_poly" part. you can import the cython lib to replace this part. I'm trying use some code from you to re implement this article https://arxiv.org/abs/1801.01671, it achieve a good result on icdar15, is very similar with east framework. Thanks for share your code !

Yuanhang8605 commented 6 years ago

I have totally rewrite the data input part, use the tf.data.Dataset API, and add data augument like rotate 0~10 degree, random_crop, rotate 90 degree, all use tensorflow matrix operations, it's much faster than the numpy edition. I have rewrite the postprocess part of your code use tensorflow matrix op, like this function: "def restore_rectangle_rbox(origin, geometry):", the tensorflow style is like this: you can use broadcast to simplify your code.

  def _decode_geo_map_to_8pts(self, origin, geometry):
    """Decode the geometry to 8 vertices coordinates. 

    Args:
      origin: a tensor of shape [num_box, 2], represent the origin x,y coordinate. 
      geometry: a tensor of shape [num_box, 5]

    Returns:
      decode_bboxes: a tensor of shape [num_box, 4, 2], the 8 elements represent the 
                     4 vertices coords
    """
    def _geo_decode_fn(args, is_pos_angle= True):
      # use the lowest point as rectangle coord ori. 
      top_, right_, down_, left_, angle_, origin_ = args
      num_box_ = tf.size(top_)
      rect_h = top_ + down_
      rect_w = right_ + left_
      if is_pos_angle:
        # the lowest point is 3
        x0, y0 = tf.zeros([num_box_, ]), -rect_h
        x1, y1 = rect_w, -rect_h
        x2, y2 = rect_w, tf.zeros([num_box_, ])
        x3, y3 = tf.zeros([num_box_, ]), tf.zeros([num_box_, ])
        pt_array = tf.stack([x0, y0, x1, y1,
                             x2, y2, x3, y3,
                             left_, -down_], axis=1)
        rot_mat_x = tf.stack([tf.cos(angle_), tf.sin(angle_)], axis=1)
        rot_mat_y = tf.stack([-tf.sin(angle_), tf.cos(angle_)], axis=1)
      else:
        # the lowest point is 2
        x0, y0 = -rect_w, -rect_h
        x1, y1 = tf.zeros([num_box_, ]), -rect_h
        x2, y2 = tf.zeros([num_box_, ]), tf.zeros([num_box_, ])
        x3, y3 = -rect_w, tf.zeros([num_box_, ])        
        pt_array = tf.stack([x0, y0, x1, y1,
                             x2, y2, x3, y3,
                             -right_, -down_], axis=1)
        rot_mat_x = tf.stack([tf.cos(-angle_), -tf.sin(-angle_)], axis=1)
        rot_mat_y = tf.stack([tf.sin(-angle_), tf.cos(-angle_)], axis=1)

      pt_array = tf.reshape(pt_array, [-1, 5, 2])

      rot_mat_x = tf.expand_dims(rot_mat_x, axis=1)
      rot_mat_y = tf.expand_dims(rot_mat_y, axis=1)

      pt_rot_x = tf.reduce_sum(rot_mat_x * pt_array, axis=2)
      pt_rot_y = tf.reduce_sum(rot_mat_y * pt_array, axis=2)

      pt_rot = tf.stack([pt_rot_x, pt_rot_y], axis=2)

      ori_offset = tf.expand_dims(origin_ - pt_rot[:, 4, :], axis=1)

      pt_result = pt_rot + ori_offset

      return pt_result[:, :-1, :]

    def _filter_pos_neg_angle_bboxes(args, is_pos_angle=True):
      top, right, down, left, angle = args
      if is_pos_angle:
        angle_indx = tf.where(tf.greater_equal(angle, 0.0))
      else:
        angle_indx = tf.where(tf.less(angle, 0.0))

      angle_indx = tf.cast(tf.reshape(angle_indx, [-1]), tf.int32)    
      top_f = tf.gather(top, angle_indx)
      right_f = tf.gather(right, angle_indx)
      down_f = tf.gather(down, angle_indx)
      left_f = tf.gather(left, angle_indx)
      angle_f = tf.gather(angle, angle_indx)
      origin_f = tf.gather(origin, angle_indx)

      return (top_f, right_f, down_f, left_f, angle_f, origin_f), angle_indx     

    with tf.name_scope('DecodeGeometry'):
      geo_list = tf.unstack(geometry, axis=-1)
      pos_params, pos_angle_indicies = _filter_pos_neg_angle_bboxes(geo_list, True)
      neg_params, neg_angle_indicies = _filter_pos_neg_angle_bboxes(geo_list, False)
      has_pos = tf.greater(tf.size(pos_angle_indicies), 0)
      has_neg = tf.greater(tf.size(neg_angle_indicies), 0)
      false_bboxes = tf.zeros([0, 4, 2])

      pos_angle_bboxes = tf.cond(has_pos, lambda: _geo_decode_fn(pos_params, True), 
                                          lambda: false_bboxes)
      neg_angle_bboxes = tf.cond(has_neg, lambda: _geo_decode_fn(neg_params, False), 
                                          lambda: false_bboxes)      

      decoded_bboxes = tf.concat([pos_angle_bboxes, neg_angle_bboxes], axis=0) 
      selected_indicies = tf.concat([pos_angle_indicies, neg_angle_indicies], axis=0)

      return decoded_bboxes, selected_indicies
Yuanhang8605 commented 6 years ago

I want to detect long text, but the receipt field is not big enough in the 1/4 feature maps, so I add this part before predictor, Do you think it's useful or not?

  def _incept_module(self, inputs):
    """ the incept module to capture different scale and ratio text. 
    """
    with tf.name_scope('InceptModule'):
      with tf.name_scope('Branch_1x1'):
        branch_1x1 = slim.conv2d(inputs, 64, [1, 1], scope='branch1x1_0')
        branch_1x1 = slim.conv2d(branch_1x1, 64, [1, 1], scope='branch1x1_1')
      with tf.name_scope('Branch_3x3'):
        branch_3x3 = slim.conv2d(inputs, 64, [1, 1], scope='branch3x3_0')
        branch_3x3 = slim.conv2d(branch_3x3, 64, [1, 3], scope='branch3x3_1')
        branch_3x3 = slim.conv2d(branch_3x3, 64, [3, 1], scope='branch3x3_2')
      with tf.name_scope('Branch_5x5'):
        branch_5x5 = slim.conv2d(inputs, 64, [1, 1], scope='branch5x5_0')
        branch_5x5 = slim.conv2d(branch_5x5, 64, [1, 5], scope='branch5x5_1')
        branch_5x5 = slim.conv2d(branch_5x5, 64, [5, 1], scope='branch5x5_2')
      with tf.name_scope('Branch_7x7'):
        branch_7x7 = slim.conv2d(inputs, 64, [1, 1], scope='branch7x7_0')
        branch_7x7 = slim.conv2d(branch_7x7, 64, [1, 7], scope='branch7x7_1')
        branch_7x7 = slim.conv2d(branch_7x7, 64, [7, 1], scope='branch7x7_2')
      # concat 
      net = tf.concat([branch_1x1, branch_3x3, 
                       branch_5x5, branch_7x7], axis=3)
      net = slim.conv2d(net, 64, [1, 1], scope='merge_conv')
      # resnet part
      shortcut = slim.conv2d(inputs, 64, [1, 1], scope='shortcut_conv')

      net = tf.nn.relu(net + shortcut)
      return net
Yuanhang8605 commented 6 years ago

Before east framework, I have just rewrite the TextBoxes++ using tensorflow. It achieve 0.84 fscore in the icdar15, and I improve it to detect long text by use long conv kernel, it can detect very long hori text, but for very long large angle text, it perform badly. the good one like this: tb1bk4klxxxxxchxpxxunyplfxx the bad one like this: tb1bavhlxxxxxarxfxxunyplfxx Do you think the East framework can detect long text?

Yuanhang8605 commented 6 years ago

Excited! I have just finish pretrain FOTS, here is the results in SynthText: batman_131_102

KwangKa commented 6 years ago

@Yuanhang8605 detect result of FOTS seems pretty good, will you put FOTS code on github?

dajiangxiaoyan commented 6 years ago

@Yuanhang8605 Does east works well on long text after adding inception module?

Yuanhang8605 commented 6 years ago

@dajiangxiaoyan. The effect is not very well. I use SSD to detect very dense long text, the result is like this: image. I'm still trying to modify the FOTS framework to detect long text, may be you can try ASPP to expand the receptive field of the conv kernel. image

Yuanhang8605 commented 6 years ago

@dajiangxiaoyan The inception module can actually expand the receptive field of the net. But if you use the IOU loss, the very long text can not be accurately localized. I have an idea to modify the localization loss, when I finish, I will tell you.

Yuanhang8605 commented 6 years ago

@KwangKa I'm still try to modify the FOTS framework to detect long text, I want to publish a paper this year, to develop a method can detect oriented and very long text. So If I have published the paper, I'll put my code on github.

dajiangxiaoyan commented 6 years ago

@Yuanhang8605 Thanks very much for replying. And I have also tried to add inception in east. But result is still bad. image

dajiangxiaoyan commented 6 years ago

@Yuanhang8605 What change you do in ssd net in order to detect long text?

Yuanhang8605 commented 6 years ago

@dajiangxiaoyan I'm sorry, I can't tell you the detail because it's my company's proprietary. We totally rewrite the original SSD, add dense anchors, finally it can detect very dense long text. But it's not perfect yet, for large angle long text, it performs not so good. So I decide to modify east to detect long text, I have an idea to combine the SSD and east, I don't know whether it works. Recently, many paper use image semantic segmentation and instance segmentation tech to detect text, maybe you can have a try.

Have you found any open source framework can detect long chinese text ?

dajiangxiaoyan commented 6 years ago

@Yuanhang8605 CTPN works well on long horizontal text. But it cannot detect any oriented text. I have also tried textboxes, textboxes++, RRPN. They all work badly on long text. I think all the methods are based on anchors which are limited by the receptive field of the net. I will try pixel segmentation, such as pixel_link.

Yuanhang8605 commented 6 years ago

@dajiangxiaoyan can pixel_link detect very dense long text?

Yuanhang8605 commented 6 years ago

@dajiangxiaoyan I read pixel link paper today. I think you can modify the algorithm to avoid detect multiline text like this: divide the text mask into two type: boundary region and inner region, only link the inner pixel to inner pixel, and inner pixel to boundary pixel, avoid to link boundary pixel to boundary pixel, you can have a try. Now I have a very cool idea to develop a new framework to detect oriented scene text, but I think pixel link worthy to have a try. Maybe later we can work together to improve the pixel link.

dajiangxiaoyan commented 6 years ago

@Yuanhang8605 Pixel link

  1. I have modified east with smooth_l1 loss. But the model is very bad.
  2. PixelLink works on dense long text. But it needs a lot of post-processing which is easily affected by complicated background. image image
Yuanhang8605 commented 6 years ago

@dajiangxiaoyan I have finished my SSD + East models, and get good pretrained result on Synthtext. Do you get any progress in Pixel-link?

IngleJaya95 commented 6 years ago

@Yuanhang8605 Can you please share FOTS implementaion, if you have done ?

Fighting-JJ commented 5 years ago

@dajiangxiaoyan I have finished my SSD + East models, and get good pretrained result on Synthtext. Do you get any progress in Pixel-link?

@Yuanhang8605 You can search for Advanced EAST git repository, maybe that repository can give you some idea.

sdzbft commented 5 years ago

framework

hahhhh,dalao nihao, Have you successfully reimplemented FOTS? thanks a lot in advance