Text Detection 학습및 평가방법 고찰

chullhwan-song commented 5 years ago

학습셋

[1] Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping # Multi-scale evaluation

chullhwan-song commented 5 years ago

데이터 이름	training set	test set	val set	언어	형태
IC13	229	233		En	horizontal
IC15 - Incidental Scene Text	1000	500		En	Google Glass, quadrilaterals
IC17	7,200	9,000	1,800	multi-lingual
MSRA-TD500	300	200		EN, CH	line-level
TotalText	1255	300			curved texts
CTW-1500	1000	500
COCO-Text	43,686	20,000

chullhwan-song commented 5 years ago

research	Pretrain	Training Data	augmentation
PixelLink	No	IC15-train
SegLink	SynthText	,IC15-train
EAST	ImageNet	IC15-train ,IC13-train(229개)
Text-Block FCN	ImageNet	IC15-train	Y
FOTS	ImageNet, SynthText	MLT 학습/val set, IC15-train+IC13-train	Y, i) longer sides of images are resized from 640 pixels to 2560 pixels, ii) rotated in range [−10, 10] ] randomly, iii) rescaled with ratio from 0.8 to 1.2 iv) 640×640 random samples are cropped from the transformed images.

FOTS - End to End라 애매
- we first train our model using 9000 images from ICDAR 2017 MLT training and validation datasets, then we use 1000 ICDAR 2015 training images and 229 ICDAR 2013 training images to fine-tune our model.
- 2017 MLT 학습셋+val set를 이용하여 첫번째 학습. 이후 ICDAR 2015+CDAR 2013 학습이미지로 finetuning

chullhwan-song commented 5 years ago

research	Pretrained	Training Data	augmentation
PixelLink	IC15-train	ITD500-train + HUST-TR400
EAST	ImageNet	TD500-train, HUSTTR400
Text-Block FCN	ImageNet	TD500-train	Y
[1]	ImageNet	TD500-train + HUST-TR400	Y

chullhwan-song commented 5 years ago

research	Pretrain	Training Data	augmentation
PixelLink	IC15-train	IC13-train,TD500-train and HUST-TR400	Y
FOTS	SynthText, ImageNet	MLT 학습셋+val set, IC15-train+IC13-train	Y IC15와 동일
[1]	ImageNet	IC15-train+IC13-train	Y

FOTS
- We use model trained on ImageNet dataset [29] as our pre-trained model. The training process includes two steps: first we use Synth800k dataset [10] to train the network for 10 epochs, and then real data is adopted to fine-tune the model until convergence. Different training datasets are adopted for different tasks, which will be discussed in Sec. 4. Some blurred text regions in ICDAR 2015 and ICDAR 2017 MLT datasets are labeled as “DO NOT CARE”, and we ignore them in training.

chullhwan-song commented 5 years ago

research	Pretrain	Training Data	augmentation
FOTS	SynthText, ImageNet	MLT 학습셋+val set	Y IC15와 동일
[1]	ImageNet	MLT 학습셋+val set	Y

FOTS
- We use model trained on ImageNet dataset [29] as our pre-trained model. The training process includes two steps: first we use Synth800k dataset [10] to train the network for 10 epochs, and then real data is adopted to fine-tune the model until convergence. Different training datasets are adopted for different tasks, which will be discussed in Sec. 4. Some blurred text regions in ICDAR 2015 and ICDAR 2017 MLT datasets are labeled as “DO NOT CARE”, and we ignore them in training.
[1] : 구성되어 있다고만 있지, valset을 합쳐적용했다는 의미는 없는듯..예측.

chullhwan-song commented 5 years ago

research	Pretrain	Training Data	augmentation
[1]	ImageNet	RCTW	Y