Closed Lincoln93 closed 4 years ago
@Lincoln93 Hi,
Q1: That's right. Syntext-150k is created by ourselves.
Q2: Of course. Empirically speaking, you don't need to create your own synthetic dataset.
Hope the answers address your concerns.
@Yuliang-Liu san, sorry for bothering. :) Are you saying we don't need to create synthetic data set even if the recognition domain is very different? I mean, this synthetic data set contains (I didn't look yet :p) wild text images. But what if the custom data set looks much different? So in that case, don't we need to create new synth. text image? And have you open-sourced the code behind it?
@innat Sorry for causing the confusion.
If you want to deal with multi-lingual task, you may need to create your own synthetic data. For English only problem, I think it is not necessary. Current synthetic data only contains text in the wild, and synthesizing the text for the specific scenes should be always beneficial. This is the code we built on for generating synthetic data.
@Yuliang-Liu I've tried to download the synth images of yours from the box. But It gave me restriction on download size limit. Unable to download. Any catch?
https://universityofadelaide.app.box.com/s/alta996w4fym6arh977h3k3xv55clhg3
@Yuliang-Liu san, sorry for bothering, it's working now. Eventually downloading both (image and son) may not work, need to go separately.
@innat Yes. The images resources are all from public. The generation scripts are the same as we provided. Sorry, we may not provide further instruction on this part.
https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md#text-recognition
Here, Syntext-150k (Part1: 94,723 and Part2: 54,327) are those samples, that you mentioned in the paper. You created curve synthetic images. Are these samples from you creation?
And for a custom data set, is it OK to use it? Or do I need to create my problem specific synthetic data set?
Hope you understand my concern.