Belval / TextRecognitionDataGenerator

A synthetic data generator for text recognition
MIT License
3.15k stars 943 forks source link

Write the labels to separate files #326

Open DesBw opened 8 months ago

DesBw commented 8 months ago

I was trying to use the labels as ground-truth texts in tesseract. Currently, trdg writes the labels into a single lable.txt file. I understand that the part of the script that writes them to single file is the following.

  if args.name_format == 2:
        # Create file with filename-to-label connections
        with open(
            os.path.join(args.output_dir, "labels.txt"), "w", encoding="utf8"
        ) as f:
            for i in range(string_count):
                file_name = str(i) + "." + args.extension
                label = strings[i]
                if args.space_width == 0:
                    label = label.replace(" ", "")
                f.write("{} {}\n".format(file_name, label))

Can someone with sufficient knowledge of python can help me to modify it so that the labels will be written as separate files?

What I want is:

0.gt.txt 1.gt.txt Each of those files would contain their respective labels inside.