aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.38k stars 650 forks source link

Synth data set for ABCNet. #117

Closed Lincoln93 closed 4 years ago

Lincoln93 commented 4 years ago

https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md#text-recognition

Here, Syntext-150k (Part1: 94,723 and Part2: 54,327) are those samples, that you mentioned in the paper. You created curve synthetic images. Are these samples from you creation?

And for a custom data set, is it OK to use it? Or do I need to create my problem specific synthetic data set?

Hope you understand my concern.

Yuliang-Liu commented 4 years ago

@Lincoln93 Hi,

Q1: That's right. Syntext-150k is created by ourselves.

Q2: Of course. Empirically speaking, you don't need to create your own synthetic dataset.

Hope the answers address your concerns.

innat commented 4 years ago

@Yuliang-Liu san, sorry for bothering. :) Are you saying we don't need to create synthetic data set even if the recognition domain is very different? I mean, this synthetic data set contains (I didn't look yet :p) wild text images. But what if the custom data set looks much different? So in that case, don't we need to create new synth. text image? And have you open-sourced the code behind it?

Yuliang-Liu commented 4 years ago

@innat Sorry for causing the confusion.

If you want to deal with multi-lingual task, you may need to create your own synthetic data. For English only problem, I think it is not necessary. Current synthetic data only contains text in the wild, and synthesizing the text for the specific scenes should be always beneficial. This is the code we built on for generating synthetic data.

innat commented 4 years ago

@Yuliang-Liu I've tried to download the synth images of yours from the box. But It gave me restriction on download size limit. Unable to download. Any catch?

https://universityofadelaide.app.box.com/s/alta996w4fym6arh977h3k3xv55clhg3

innat commented 4 years ago

@Yuliang-Liu san, sorry for bothering, it's working now. Eventually downloading both (image and son) may not work, need to go separately.

innat commented 4 years ago

@Yuliang-Liu Using this code, it seems like vertical synth-text and curve-text are usually not direct but need some care, here. When I read your paper, you mentioned you created curve synth text by yourself for its low resources. Have you open-sources your code on synth curve text generation.

Yuliang-Liu commented 4 years ago

@innat Yes. The images resources are all from public. The generation scripts are the same as we provided. Sorry, we may not provide further instruction on this part.