[Feat] Polygon Box에 대한 처리 - Githubissues

boostcampaitech3 / level2-data-annotation_cv-level2-cv-09

level2-data-annotation_cv-level2-cv-09 created by GitHub Classroom

0 stars 4 forks source link

[Feat] Polygon Box에 대한 처리 #28

Open jeongjae96 opened 2 years ago

jeongjae96 commented 2 years ago

What?

Baseline 코드를 통한 학습에서는 polygon box에 대한 처리가 없다. Polygon Box에 대한 처리를 직접하기 위한 방법을 구성해야 한다. (#21)

How?

방법은 크게 두 가지가 있다.

Polygon 형태를 Rect 형태로 변환하기
Polygon이 있는 이미지/annotation 제외하기

두 가지 방법의 장단점이 있을 것 같다. 먼저 Polygon 형태를 Rect 형태로 변환한다면, 데이터를 많이 사용할 수 있지만 box가 tight하지 않게 된다는 문제가 있다.

Polygon이 있는 이미지 또는 annotation을 제외하면, tight한 box들만을 사용할 수 있지만 데이터의 다양성이 사라진다.

Todo

[ ] Polygon이 있는 이미지 제외하기
[x] Polygon 형태를 외접하는 최소 크기의 직사각형 형태로 변환하기
[ ] Polygon을 Rect 형태로 바꿀 때, 글자의 방향 고려해서 변환하기
[ ] Polygon은 illegibility=True

km9mn commented 2 years ago

OCR_EDA.ipynb에 rectify_poly라는 함수가 있습니다

def rectify_poly(poly, direction, img_w, img_h):
    """일반 polygon형태인 라벨을 크롭하고 rectify해주는 함수.
    Args:
        poly: np.ndarray(2n+4, 2) (where n>0), 4, 6, 8
        image: np.ndarray opencv 포멧의 이미지
        direction: 글자의 읽는 방향과 진행 방향의 수평(Horizontal) 혹은 수직(Vertical) 여부
    Return:
        rectified: np.ndarray(2, ?) rectify된 단어 bbox의 사이즈.
    """

    n_pts = poly.shape[0]
    assert n_pts % 2 == 0
    if n_pts == 4:
        size = get_box_size(poly[None])
        h = size[:, 0] / img_h
        w = size[:, 1] / img_w
        return np.stack((h,w))

    def unroll(indices):
        return list(zip(indices[:-1], indices[1:]))

    # polygon하나를 인접한 사각형 여러개로 쪼갠다.
    indices = list(range(n_pts))
    if direction == 'Horizontal':
        upper_pts = unroll(indices[:n_pts // 2]) # (0, 1), (1, 2), ... (4, 5)
        lower_pts = unroll(indices[n_pts // 2:])[::-1] # (8, 9), (7, 8), ... (6, 7)

        quads = np.stack([poly[[i, j, k, l]] for (i, j), (k, l) in zip(upper_pts, lower_pts)])
    else:
        right_pts = unroll(indices[1:n_pts // 2 + 1]) # (1, 2), (2, 3), ... (4, 5)
        left_pts = unroll([0] + indices[:n_pts // 2:-1]) # (0, 9), (9, 8), ... (7, 6)

        quads = np.stack([poly[[i, j, k, l]] for (j, k), (i, l) in zip(right_pts, left_pts)])

    sizes = get_box_size(quads)
    if direction == 'Horizontal':
        h = sizes[:, 0].max() / img_h
        widths = sizes[:, 1]
        w = np.sum(widths) / img_w
        return np.stack((h,w)).reshape(2,-1)
        #return np.stack((h,w))
    elif direction == 'Vertical':
        heights = sizes[:, 0]
        w = sizes[:, 1].max() / img_w
        h = np.sum(heights) / img_h
        return np.stack((h,w)).reshape(2,-1)
    else:
        h = sizes[:, 0] / img_h
        w = sizes[:, 1] / img_w
        return np.stack((h,w),-1)

EDA에서는 aspect ratio(가로/세로)를 구하기 위해 쓰인 것 같은데 저희도 써볼만 한 것 같습니다

km9mn commented 2 years ago

OCR_EDA.ipynb 코드를 활용해서 확인해본 결과 좌표가 4개 (x,y로는 8개) 초과인 bbox는 897개입니다

km9mn commented 2 years ago

polygon인 bbox를 illegibility = True로 바꾸고 train.py 돌린 결과 같은 에러가 발생했습니다. 추정되는 이유는 dataset을 불러오는 dataset.py의 SceneTextDataset 클래스에서

vertices, labels = [], []
        for word_info in self.anno['images'][image_fname]['words'].values():
            vertices.append(np.array(word_info['points']).flatten())
            labels.append(int(not word_info['illegibility']))
        vertices, labels = np.array(vertices, dtype=np.float32), np.array(labels, dtype=np.int64)

        vertices, labels = filter_vertices(vertices, labels, ignore_under=10, drop_under=1)

illegibility 정보가 들어가는 labels와 별개로 vetices에 points들이 들어가는데 8개가 아닌 게 들어가서 에러를 발생시킵니다.

km9mn commented 2 years ago

deteval의 calc_deteval_metrics 함수를 보면 """ 현재는 rect(xmin, ymin, xmax, ymax) 형식의 bounding box만 지원함. 다른 형식(quadrilateral, poligon, etc.)의 데이터가 들어오면 외접하는 rect로 변환해서 이용하고 있음. """ 이렇게 서술되어있고 131번째 줄부터

    # bbox들이 rect 이외의 형식으로 되어있는 경우 rect 형식으로 변환
    _pred_bboxes_dict, _gt_bboxes_dict= deepcopy(pred_bboxes_dict), deepcopy(gt_bboxes_dict)
    pred_bboxes_dict, gt_bboxes_dict = dict(), dict()
    for sample_name, bboxes in _pred_bboxes_dict.items():
        # 원래 rect 형식이었으면 변환 없이 그대로 이용
        if len(bboxes) > 0 and np.array(bboxes[0]).ndim == 1 and len(bboxes[0]) == 4:
            pred_bboxes_dict = _pred_bboxes_dict
            break

        pred_bboxes_dict[sample_name] = []
        for bbox in map(np.array, bboxes):
            rect = [bbox[:, 0].min(), bbox[:, 1].min(), bbox[:, 0].max(), bbox[:, 1].max()]
            pred_bboxes_dict[sample_name].append(rect)
    for sample_name, bboxes in _gt_bboxes_dict.items():
        # 원래 rect 형식이었으면 변환 없이 그대로 이용
        if len(bboxes) > 0 and np.array(bboxes[0]).ndim == 1 and len(bboxes[0]) == 4:
            gt_bboxes_dict = _gt_bboxes_dict
            break

        gt_bboxes_dict[sample_name] = []
        for bbox in map(np.array, bboxes):
            rect = [bbox[:, 0].min(), bbox[:, 1].min(), bbox[:, 0].max(), bbox[:, 1].max()]
            gt_bboxes_dict[sample_name].append(rect)

polygon이 아니면 rect로 바꾸는 작업을 합니다 OCR_EDA와는 다른 것이 OCR_EDA는 polygon하나를 인접한 사각형 여러개로 쪼개는 방식입니다.

jeongjae96 commented 2 years ago

rect = [bbox[:, 0].min(), bbox[:, 1].min(), bbox[:, 0].max(), bbox[:, 1].max()] 이 코드를 통해 rect로 변환하는거군요!

km9mn commented 2 years ago

dataset.py에 위 코드를 참고하여 bbox가 rectangle이 아닐 시 rectangle로 바꿔주는 코드를 추가함 but 순서로 인한 angle 문제가 있을 수 있음 by @jeongjae96

km9mn commented 2 years ago

추가한 코드로는

IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/ml/code/east_dataset.py", line 136, in __getitem__
    image, word_bboxes, roi_mask = self.dataset[idx]
  File "/opt/ml/code/dataset.py", line 383, in __getitem__
    image, vertices = crop_img(image, vertices, labels, self.crop_size)
  File "/opt/ml/code/dataset.py", line 228, in crop_img
    flag = is_cross_text([start_w, start_h], length, new_vertices[labels==1,:])
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

위와 같은 에러가 발생했고 vertices 중 [ ] 비어있는 것들 때문 무지성으로 일일이 지우니 무사히 학습됨