Belval / TextRecognitionDataGenerator

A synthetic data generator for text recognition
MIT License
3.24k stars 966 forks source link

fix duplicates in random/wikipedia generator and adding timeout to wi… #193

Closed AghilesAzzoug closed 3 years ago

AghilesAzzoug commented 3 years ago

Fixing duplicates in random/wikipedia generator.

The code below can be used on the current version to reproduce the bug, every label will be generated 3 times. It can also be run after the fix for validation.

from trdg.generators import GeneratorFromRandom
import numpy as np

random_generator = GeneratorFromRandom(count=3000, allow_variable=True, length=10)
labels = []

for _, label in random_generator:
    labels.append(label)

values, counts = np.unique(labels, return_counts=True)

print(values)
print(counts)

print(len(labels))
print(len(set(labels)))