Open saifhassan opened 6 years ago
Please definitely make use of GPUs.
On Sat, Mar 10, 2018 at 10:54 PM Enigma notifications@github.com wrote:
Training time takes too much time. ICDAR 2015 containing 1000 images and I tried on following:
Machine 1: Corei7 with 16 GB RAM on CPU Machine 2 Corei7 with 16 GB RAM on GTX 960 Graphics Card
So what should I do for quick training. Please guide.
Thanks in advance
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/argman/EAST/issues/115, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2QT3JP3s4C4H4UsTbQYl3VPtss5Na7ks5tc-lAgaJpZM4SlTdM .
@zxytim I have tried using GPUs on HP Laptop having GTX Graphics card of 960 but still same
Except GPU, is there any possibility that I can do because it consumes almost an hour to complete 10-20 steps.
I think the function rbox use too many time. https://github.com/argman/EAST/blob/63bd302465cfeace305f02e39c5bc73ec7d0bbea/icdar.py#L568 Repeat execution the function: https://github.com/argman/EAST/blob/63bd302465cfeace305f02e39c5bc73ec7d0bbea/icdar.py#L246 Contain numpy method,only run on cpu,you can pretreatment your data before training.
use cython to rewrite this part,I have 20× speed after rewrite this part for y, x in xy_in_poly. You can contact me to get the code
@LouisRuan yes, you can rewrite that part with numpy matrix operation @Yuanhang8605 good job, pull requests welcome!
I have create a repo for you "https://github.com/Yuanhang8605/geo_map_gen-for-argman-east.git", you can download the cython code and check whether it's right or not. just the "for y, x in xy_in_poly" part. you can import the cython lib to replace this part. I'm trying use some code from you to re implement this article https://arxiv.org/abs/1801.01671, it achieve a good result on icdar15, is very similar with east framework. Thanks for share your code !
I have totally rewrite the data input part, use the tf.data.Dataset API, and add data augument like rotate 0~10 degree, random_crop, rotate 90 degree, all use tensorflow matrix operations, it's much faster than the numpy edition. I have rewrite the postprocess part of your code use tensorflow matrix op, like this function: "def restore_rectangle_rbox(origin, geometry):", the tensorflow style is like this: you can use broadcast to simplify your code.
def _decode_geo_map_to_8pts(self, origin, geometry):
"""Decode the geometry to 8 vertices coordinates.
Args:
origin: a tensor of shape [num_box, 2], represent the origin x,y coordinate.
geometry: a tensor of shape [num_box, 5]
Returns:
decode_bboxes: a tensor of shape [num_box, 4, 2], the 8 elements represent the
4 vertices coords
"""
def _geo_decode_fn(args, is_pos_angle= True):
# use the lowest point as rectangle coord ori.
top_, right_, down_, left_, angle_, origin_ = args
num_box_ = tf.size(top_)
rect_h = top_ + down_
rect_w = right_ + left_
if is_pos_angle:
# the lowest point is 3
x0, y0 = tf.zeros([num_box_, ]), -rect_h
x1, y1 = rect_w, -rect_h
x2, y2 = rect_w, tf.zeros([num_box_, ])
x3, y3 = tf.zeros([num_box_, ]), tf.zeros([num_box_, ])
pt_array = tf.stack([x0, y0, x1, y1,
x2, y2, x3, y3,
left_, -down_], axis=1)
rot_mat_x = tf.stack([tf.cos(angle_), tf.sin(angle_)], axis=1)
rot_mat_y = tf.stack([-tf.sin(angle_), tf.cos(angle_)], axis=1)
else:
# the lowest point is 2
x0, y0 = -rect_w, -rect_h
x1, y1 = tf.zeros([num_box_, ]), -rect_h
x2, y2 = tf.zeros([num_box_, ]), tf.zeros([num_box_, ])
x3, y3 = -rect_w, tf.zeros([num_box_, ])
pt_array = tf.stack([x0, y0, x1, y1,
x2, y2, x3, y3,
-right_, -down_], axis=1)
rot_mat_x = tf.stack([tf.cos(-angle_), -tf.sin(-angle_)], axis=1)
rot_mat_y = tf.stack([tf.sin(-angle_), tf.cos(-angle_)], axis=1)
pt_array = tf.reshape(pt_array, [-1, 5, 2])
rot_mat_x = tf.expand_dims(rot_mat_x, axis=1)
rot_mat_y = tf.expand_dims(rot_mat_y, axis=1)
pt_rot_x = tf.reduce_sum(rot_mat_x * pt_array, axis=2)
pt_rot_y = tf.reduce_sum(rot_mat_y * pt_array, axis=2)
pt_rot = tf.stack([pt_rot_x, pt_rot_y], axis=2)
ori_offset = tf.expand_dims(origin_ - pt_rot[:, 4, :], axis=1)
pt_result = pt_rot + ori_offset
return pt_result[:, :-1, :]
def _filter_pos_neg_angle_bboxes(args, is_pos_angle=True):
top, right, down, left, angle = args
if is_pos_angle:
angle_indx = tf.where(tf.greater_equal(angle, 0.0))
else:
angle_indx = tf.where(tf.less(angle, 0.0))
angle_indx = tf.cast(tf.reshape(angle_indx, [-1]), tf.int32)
top_f = tf.gather(top, angle_indx)
right_f = tf.gather(right, angle_indx)
down_f = tf.gather(down, angle_indx)
left_f = tf.gather(left, angle_indx)
angle_f = tf.gather(angle, angle_indx)
origin_f = tf.gather(origin, angle_indx)
return (top_f, right_f, down_f, left_f, angle_f, origin_f), angle_indx
with tf.name_scope('DecodeGeometry'):
geo_list = tf.unstack(geometry, axis=-1)
pos_params, pos_angle_indicies = _filter_pos_neg_angle_bboxes(geo_list, True)
neg_params, neg_angle_indicies = _filter_pos_neg_angle_bboxes(geo_list, False)
has_pos = tf.greater(tf.size(pos_angle_indicies), 0)
has_neg = tf.greater(tf.size(neg_angle_indicies), 0)
false_bboxes = tf.zeros([0, 4, 2])
pos_angle_bboxes = tf.cond(has_pos, lambda: _geo_decode_fn(pos_params, True),
lambda: false_bboxes)
neg_angle_bboxes = tf.cond(has_neg, lambda: _geo_decode_fn(neg_params, False),
lambda: false_bboxes)
decoded_bboxes = tf.concat([pos_angle_bboxes, neg_angle_bboxes], axis=0)
selected_indicies = tf.concat([pos_angle_indicies, neg_angle_indicies], axis=0)
return decoded_bboxes, selected_indicies
I want to detect long text, but the receipt field is not big enough in the 1/4 feature maps, so I add this part before predictor, Do you think it's useful or not?
def _incept_module(self, inputs):
""" the incept module to capture different scale and ratio text.
"""
with tf.name_scope('InceptModule'):
with tf.name_scope('Branch_1x1'):
branch_1x1 = slim.conv2d(inputs, 64, [1, 1], scope='branch1x1_0')
branch_1x1 = slim.conv2d(branch_1x1, 64, [1, 1], scope='branch1x1_1')
with tf.name_scope('Branch_3x3'):
branch_3x3 = slim.conv2d(inputs, 64, [1, 1], scope='branch3x3_0')
branch_3x3 = slim.conv2d(branch_3x3, 64, [1, 3], scope='branch3x3_1')
branch_3x3 = slim.conv2d(branch_3x3, 64, [3, 1], scope='branch3x3_2')
with tf.name_scope('Branch_5x5'):
branch_5x5 = slim.conv2d(inputs, 64, [1, 1], scope='branch5x5_0')
branch_5x5 = slim.conv2d(branch_5x5, 64, [1, 5], scope='branch5x5_1')
branch_5x5 = slim.conv2d(branch_5x5, 64, [5, 1], scope='branch5x5_2')
with tf.name_scope('Branch_7x7'):
branch_7x7 = slim.conv2d(inputs, 64, [1, 1], scope='branch7x7_0')
branch_7x7 = slim.conv2d(branch_7x7, 64, [1, 7], scope='branch7x7_1')
branch_7x7 = slim.conv2d(branch_7x7, 64, [7, 1], scope='branch7x7_2')
# concat
net = tf.concat([branch_1x1, branch_3x3,
branch_5x5, branch_7x7], axis=3)
net = slim.conv2d(net, 64, [1, 1], scope='merge_conv')
# resnet part
shortcut = slim.conv2d(inputs, 64, [1, 1], scope='shortcut_conv')
net = tf.nn.relu(net + shortcut)
return net
Before east framework, I have just rewrite the TextBoxes++ using tensorflow. It achieve 0.84 fscore in the icdar15, and I improve it to detect long text by use long conv kernel, it can detect very long hori text, but for very long large angle text, it perform badly. the good one like this: the bad one like this: Do you think the East framework can detect long text?
Excited! I have just finish pretrain FOTS, here is the results in SynthText:
@Yuanhang8605 detect result of FOTS seems pretty good, will you put FOTS code on github?
@Yuanhang8605 Does east works well on long text after adding inception module?
@dajiangxiaoyan. The effect is not very well. I use SSD to detect very dense long text, the result is like this: . I'm still trying to modify the FOTS framework to detect long text, may be you can try ASPP to expand the receptive field of the conv kernel.
@dajiangxiaoyan The inception module can actually expand the receptive field of the net. But if you use the IOU loss, the very long text can not be accurately localized. I have an idea to modify the localization loss, when I finish, I will tell you.
@KwangKa I'm still try to modify the FOTS framework to detect long text, I want to publish a paper this year, to develop a method can detect oriented and very long text. So If I have published the paper, I'll put my code on github.
@Yuanhang8605 Thanks very much for replying. And I have also tried to add inception in east. But result is still bad.
@Yuanhang8605 What change you do in ssd net in order to detect long text?
@dajiangxiaoyan I'm sorry, I can't tell you the detail because it's my company's proprietary. We totally rewrite the original SSD, add dense anchors, finally it can detect very dense long text. But it's not perfect yet, for large angle long text, it performs not so good. So I decide to modify east to detect long text, I have an idea to combine the SSD and east, I don't know whether it works. Recently, many paper use image semantic segmentation and instance segmentation tech to detect text, maybe you can have a try.
Have you found any open source framework can detect long chinese text ?
@Yuanhang8605 CTPN works well on long horizontal text. But it cannot detect any oriented text. I have also tried textboxes, textboxes++, RRPN. They all work badly on long text. I think all the methods are based on anchors which are limited by the receptive field of the net. I will try pixel segmentation, such as pixel_link.
@dajiangxiaoyan can pixel_link detect very dense long text?
@dajiangxiaoyan I read pixel link paper today. I think you can modify the algorithm to avoid detect multiline text like this: divide the text mask into two type: boundary region and inner region, only link the inner pixel to inner pixel, and inner pixel to boundary pixel, avoid to link boundary pixel to boundary pixel, you can have a try. Now I have a very cool idea to develop a new framework to detect oriented scene text, but I think pixel link worthy to have a try. Maybe later we can work together to improve the pixel link.
@Yuanhang8605 Pixel link
@dajiangxiaoyan I have finished my SSD + East models, and get good pretrained result on Synthtext. Do you get any progress in Pixel-link?
@Yuanhang8605 Can you please share FOTS implementaion, if you have done ?
@dajiangxiaoyan I have finished my SSD + East models, and get good pretrained result on Synthtext. Do you get any progress in Pixel-link?
@Yuanhang8605 You can search for Advanced EAST git repository, maybe that repository can give you some idea.
framework
hahhhh,dalao nihao, Have you successfully reimplemented FOTS? thanks a lot in advance
Training time takes too much time. ICDAR 2015 containing 1000 images and I tried on following:
Machine 1: Corei7 with 16 GB RAM on CPU Machine 2 Corei7 with 16 GB RAM on GTX 960 Graphics Card
So what should I do for quick training. Please guide.
Thanks in advance