boostcampaitech7 / level2-cv-datacentric-cv-07

level2-cv-datacentric-cv-07 created by GitHub Classroom
0 stars 2 forks source link
λŒ€νšŒ κ°œμš”

πŸ† λ‹€κ΅­μ–΄ 영수증 OCR

[πŸ‘€Model](#final-model) | [πŸ€”Issues](https://github.com/boostcampaitech7/level2-objectdetection-cv-07/issues) | [πŸš€External Data](#external-data---cord)

Introduction

주둜 AI λͺ¨λΈμ˜ κ΅¬μ‘°λ‚˜ μ•Œκ³ λ¦¬μ¦˜μ— μ§‘μ€‘ν•˜κΈ° μ‰½μ§€λ§Œ, μ‹€λ¬΄μ—μ„œλŠ” λ°μ΄ν„°μ˜ ν’ˆμ§ˆμ΄ λͺ¨λΈ μ„±λŠ₯만큼 μ€‘μš”ν•©λ‹ˆλ‹€. λ³Έ λŒ€νšŒμ—μ„œλŠ” Data-Centric AI μ ‘κ·Ό 방식을 톡해, λ‹€κ΅­μ–΄(쀑ꡭ어, 일본어, νƒœκ΅­μ–΄, λ² νŠΈλ‚¨μ–΄) 영수증 μ΄λ―Έμ§€μ—μ„œ κΈ€μžλ₯Ό κ²€μΆœν•˜λŠ” OCR 과제λ₯Ό μˆ˜ν–‰ν•˜κ³ μž ν•©λ‹ˆλ‹€.

Goal : μ“°λ ˆκΈ° 객체λ₯Ό νƒμ§€ν•˜λŠ” λͺ¨λΈμ„ κ°œλ°œν•˜μ—¬ μ •ν™•ν•œ λΆ„λ¦¬μˆ˜κ±°μ™€ ν™˜κ²½ 보호λ₯Ό 지원
Data : UFO 포맷의 κΈ€μžκ°€ ν¬ν•¨λœ JPG 이미지 (Train Data 총 400μž₯, Test Data 총 120μž₯)
Metric : DetEval(Final Precision, Final Recall, Final F1-Score)

Project Overview

초기 λ‹¨κ³„μ—μ„œλŠ” EDA와 베이슀라인 μ½”λ“œ 뢄석을 톡해 데이터와 λͺ¨λΈμ— λŒ€ν•œ 기초적인 뢄석을 μ§„ν–‰ν•œ ν›„, μ™ΈλΆ€ 및 ν•©μ„± 데이터λ₯Ό ν™œμš©ν•˜κ³  데이터 ν΄λ Œμ§•κ³Ό 증강 기법을 μ μš©ν•œ λ‹€μ–‘ν•œ μ‹€ν—˜μ„ 톡해 λͺ¨λΈμ˜ μΌλ°˜ν™” μ„±λŠ₯을 μ΅œμ ν™”ν•˜μ˜€μŠ΅λ‹ˆλ‹€. μ΅œμ’…μ μœΌλ‘œλŠ” 5-fold 앙상블을 μ μš©ν•˜μ—¬ 졜적의 μ„±λŠ₯을 λ„μΆœν•˜μ˜€μŠ΅λ‹ˆλ‹€.
결과적으둜 precision:0.9427, recall:0.8801, f1:0.9103λ₯Ό λ‹¬μ„±ν•˜μ—¬ λ¦¬λ”λ³΄λ“œμ—μ„œ 4μœ„λ₯Ό κΈ°λ‘ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

μ΅œμ’… public λ¦¬λ”λ³΄λ“œ μˆœμœ„

Model

베이슀라인 λͺ¨λΈμ€ EAST (An Efficient and Accurate Scene Text Detector; Zhou et al., 2017)이고, Backboneλ‘œλŠ” ImageNet에 μ‚¬μ „ν›ˆλ ¨λœ VGG-16 (Visual Geometry Group - 16 layers; Simonyan and Zisserman, 2015)을 μ‚¬μš©ν•©λ‹ˆλ‹€.

Data

dataset
  β”œβ”€β”€ chinese_receipt
      β”œβ”€β”€ img # train 및 test image
      └── ufo # train 및 test image에 λŒ€ν•œ annotation file (ufo format)
  β”œβ”€β”€ japanese_receipt
      β”œβ”€β”€ img # train 및 test image
      └── ufo # train 및 test image에 λŒ€ν•œ annotation file (ufo format)
  β”œβ”€β”€ thai_receipt
      β”œβ”€β”€ img # train 및 test image
      └── ufo # train 및 test image에 λŒ€ν•œ annotation file (ufo format)
  └── vietnamese_receipt
      β”œβ”€β”€ img # train 및 test image
      └── ufo # train 및 test image에 λŒ€ν•œ annotation file (ufo format)

User Guide

cd code # code ν΄λ”λ‘œ 이동
python train.py # λͺ¨λΈ ν•™μŠ΅ μ‹€ν–‰
python validate.py # ν•™μŠ΅λœ κ°€μ€‘μΉ˜λ₯Ό λΆˆλŸ¬μ™€ validation μˆ˜ν–‰
python test.py # κ°€μž₯ 높은 validation 점수λ₯Ό κΈ°λ‘ν•œ κ°€μ€‘μΉ˜λ₯Ό λΆˆλŸ¬μ™€ test 데이터셋에 λŒ€ν•œ μΆ”λ‘  μˆ˜ν–‰ 

File Tree

β”œβ”€β”€ .github
β”œβ”€β”€ external-data
    β”œβ”€β”€ cord-data
    β”œβ”€β”€ synthetic-data
β”œβ”€β”€ code
    β”œβ”€β”€ model code
└── README.md

External Data - CORD

License and Data Attribution

This project uses the CORD (Consolidated OCR Dataset). The dataset is provided under the CORD license terms, and we adhere to these terms within this repository.

Attribution

For full details on the CORD license and permissions, please refer to the official CORD documentation.

Environment Setting

System Information Tools and Libraries
Category Details Category Details
Operating System Linux 5.4.0 Git 2.25.1
Python 3.10.13 Conda 23.9.0
GPU Tesla V100-SXM2-32GB Tmux 3.0a
CUDA 12.2


Β© 2024 LuckyVicky Team.

Supported by Naver BoostCamp AI Tech.


πŸ‘₯ Team Members of LuckyVicky

πŸ€μ΄λ™μ§„ πŸ€μ •μ§€ν™˜ πŸ€μœ μ •μ„  πŸ€μ‹ μŠΉμ²  πŸ€κΉ€μ†Œμ • πŸ€μ„œμ •μ—°
μ„œλ²„ 관리,
Failure Analysis,
앙상블
데이터 μ „μ²˜λ¦¬,
Augmentation
EDA, 데이터 μ „μ²˜λ¦¬,
Augmentation
데이터 μ „μ²˜λ¦¬,
Augmentation
데이터 ν•©μ„±,
μŠ€μΌ€μ€„λ§,
λ¬Έμ„œν™”
μ™ΈλΆ€ 데이터셋 ν•™μŠ΅,
κΉƒ 관리