How did you happen use the data from SynthText

mohammedayub44 commented 3 years ago

Hi,

Sorry for the naïve question, I have downloaded the synthetic images (~38GB), depth maps (15GB), segmentation maps(7GB), and raw images (~9GB) from the SynthText repo. I'm wondering how did you convert all of these into a format accepted by you network (from the readme which seems to be in YOLO or ICDAR format.)

Aiming to runs some trails for just English using your E2E network.

Thanks in advance !

MichalBusta commented 3 years ago

Hi Mohammed, you may find some scripts here:

https://github.com/MichalBusta/dataset_conversions Hope it helps, Michal

mohammedayub44 commented 3 years ago

Thanks I'll check it out and let you know.

mohammedayub44 commented 3 years ago

Using some inspiration from your conversion scripts and SynthText, I managed to create a train folder (~35gb) that contains gt_image_name.txt and image_name.jpg, as suggested by your train readme file for ICDAR format. I'm also planning to create a crop folder which contain all crops (jpg's) and txt file linking to the words.

I'm slightly confused how to initiate the training with the correct folder locations:

1) For -train_list a) Should I also create another file like sample_train_data/MLT/trainMLT.txt which list all image locations.? b) I'm guessing the done folder get's populated during training, I don't have to create and populate is prior to training ? c) can I skip a) and b) and just give my train folder location

2) FOR -ocr_feed_list - this is simple, I can directly give the gt.txt that I create from my crop folder (no doubts here).

3) For -model - If I skip giving this parameter, my guess is it trains a model form scratch rather than finetuning on already trained one ?

Sorry for the basic questions.

Thanks in advance !

MichalBusta commented 3 years ago

Using some inspiration from your conversion scripts and SynthText, I managed to create a train folder (~35gb) that contains gt_image_name.txt and image_name.jpg, as suggested by your train readme file for ICDAR format. I'm also planning to create a crop folder which contain all crops (jpg's) and txt file linking to the words.

I'm slightly confused how to initiate the training with the correct folder locations:

For -train_list a) Should I also create another file like sample_train_data/MLT/trainMLT.txt which list all image locations.?

yes - I would recommend at least to read data feeding script - it is simple python, and all errors are usually caused by worng data feeding

b) I'm guessing the done folder get's populated during training, I don't have to create and populate is prior to training ?

done folder is just folder - you can ignore it.

c) can I skip a) and b) and just give my train folder location

no, you have to provide a list - if your data are clean, you can dump it with one command, something like: ls -R *.png >> list.txt

FOR -ocr_feed_list - this is simple, I can directly give the gt.txt that I create from my crop folder (no doubts here).

For -model - If I skip giving this parameter, my guess is it trains a model form scratch rather than finetuning on already trained one ?

yes.

Sorry for the basic questions. Thanks in advance ! you are welcome.

mohammedayub44 commented 3 years ago

Thanks. I'll check and let you know.

mohammedayub44 commented 3 years ago

@MichalBusta Got the training to work correctly as per your suggestion. Model seems to be doing okay but not great. I'm wondering is this because of input_size parameter ? I see that in data_gen.py the input images are cropped and rescaled to input_size which by default is 512.

...
   resize_h = input_size
   resize_w = input_size
...
  scaled = cut_image(im,  (resize_w, resize_w), text_polys)

However all my input images are 450x600 size. Do I have to resize all my synthetic images to one particular height and width before starting to train ? I was hoping not.

I can share you the metrics and results to be specific.

Thanks !

MichalBusta / E2E-MLT

How did you happen use the data from SynthText #72