argman / EAST

A tensorflow implementation of EAST text detector
GNU General Public License v3.0
3.01k stars 1.05k forks source link

Data-set formating and small area of polygon #79

Open engahmed1190 opened 6 years ago

engahmed1190 commented 6 years ago

Hello @zxytim @argman

i am using ICDAR 2017 Challenge on Text Extraction from Biomedical Literature Figures , i have found something strange while the training i got

what is the reason for invalid poly

def check_and_validate_polys(polys, tags, xxx_todo_changeme):
    '''
    check so that the text poly is in the same direction,
    and also filter some invalid polygons
    :param polys:
    :param tags:
    :return:
    '''
    (h, w) = xxx_todo_changeme
    if polys.shape[0] == 0:
        return polys
    polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w-1)
    polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h-1)

    validated_polys = []
    validated_tags = []
    for poly, tag in zip(polys, tags):
        p_area = polygon_area(poly)
        if abs(p_area) < 1:
            # print poly
            print('invalid poly')
            continue
        if p_area > 0:
            print('poly in wrong direction')
            poly = poly[(0, 3, 2, 1), :]
        validated_polys.append(poly)
        validated_tags.append(tag)
    return np.array(validated_polys), np.array(validated_tags)

here is training log

Step 000190, model loss 0.0215, total loss 0.0504, 1.01 seconds/step, 13.90 examples/second
invalid poly
invalid poly
invalid poly
invalid poly
invalid poly
invalid poly
invalid poly
invalid poly

here is snip of the gt text

7,23,450,23,450,37,77,37,ROC Curves for meta-analysis on Simulated Data Sets
365,320,399,320,399,330,365,330,Fisher
365,335,463,335,463,345,365,345,POE with Bss/Wss
365,349,429,349,429,359,365,359,POE with IC
365,363,422,363,422,373,365,373,GeneMeta
365,378,396,378,396,388,365,388,Naive
365,392,419,392,419,402,365,402,RankProd
365,407,398,407,398,417,365,417,DEDS
209,473,319,473,319,483,209,483,False Positive Rates
4,190,14,190,14,296,4,296,True Positive Rates

and here is an image example

img_1 1

argman commented 6 years ago

if the area of polygon is too small, then i think its invalid, but maybe there's bug.. i think you can compare your annotation with icdar's

engahmed1190 commented 6 years ago

Hi @argman

got you mean i have that issue related to the small numbers as 1.0 , what the effect of decreasing the if abs(p_area) < 1 to 0.5 or so

argman commented 6 years ago

i think areas too small may not be fitted by the model, but you can have a try

engahmed1190 commented 6 years ago

Hi @argman

some types of icdar dataset consists of different representation of the polygon point as :

200 77 18 457 142 443 128 473 169 "T" 139 187 67 486 153 472 138 501 169 "o"

and also

64 200 363 243 "Colchester" 394 199 487 239 "and" 72 271 382 312 "Greenstead"64 200 363 243 "Colchester" 394 199 487 239 "and" 72 271 382 312 "Greenstead"64 200 363 243 "Colchester" 394 199 487 239 "and" 72 271 382 312 "Greenstead"

this failed in this implementation what do you think on this case

engahmed1190 commented 6 years ago

Hello @argman

Does it effect the model to converge when the Polyfit poorly conditioned

i have tried COCO data-set , used the polygon coordinates mentioned there but i have found that a warning appears

EAST/icdar.py:256: RankWarning: Polyfit may be poorly conditioned
  [k, b] = np.polyfit(p1, p2, deg=1)
EAST/icdar.py:256: RankWarning: Polyfit may be poorly conditioned
  [k, b] = np.polyfit(p1, p2, deg=1)
EAST/icdar.py:256: RankWarning: Polyfit may be poorly conditioned
  [k, b] = np.polyfit(p1, p2, deg=1)
EAST/icdar.py:256: RankWarning: Polyfit may be poorly conditioned
  [k, b] = np.polyfit(p1, p2, deg=1)
EAST/icdar.py:256: RankWarning: Polyfit may be poorly conditioned

how to solve this warning

argman commented 6 years ago

@engahmed1190 i think it can be ignored