regarding SynthText dataset

Good Day

I've been reading the dup_boxes_synth_text.py script from your data conversions scripts repository. If Im not mistaken this is what one has to use to convert the SynthText dataset for training.

Anyways, on lines 59 through 62 there are three files namely: imnames.np.npy, wordBB.np.npy and gt_txt.npz.

My question is how should I generate these files?

do I have to modify the gen.py script from SynthText github repository to generate them or they are created from the gt.mat file downloaded from the pregenerated SynthText dataset with 800000 images linked in SynthText github repository?

if yes could you tell me the format of the data within these files or point to / provide a script to do this?

your help is greatly appreciated

MichalBusta / DeepTextSpotter

regarding SynthText dataset #65