JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
24.17k stars 3.14k forks source link

complete custom easy ocr recognition model training tutorial #947

Open muhammadanas060 opened 1 year ago

muhammadanas060 commented 1 year ago
          complete custom easy ocr recognition model training tutorial

https://www.youtube.com/watch?v=-j3TbyceShY

Originally posted by @Manishsinghrajput98 in https://github.com/JaidedAI/EasyOCR/issues/495#issuecomment-915772984

muhammadanas060 commented 1 year ago

i follow your video properly when i train my data it send me error Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length

dataset_root: all_data opt.select_data: ['en_train_filtered'] opt.batch_ratio: ['1']

dataset_root: all_data dataset: en_train_filtered all_data/en_train_filtered sub-directory: /en_train_filtered num samples: 0 num total samples of en_train_filtered: 0 x 1.0 (total_data_usage_ratio) = 0 num samples of en_train_filtered per batch: 32 x 1.0 (batch_ratio) = 32

ValueError Traceback (most recent call last) Cell In [6], line 1 ----> 1 train(opt, amp=False)

File ~/Anasarshad/EasyOCR-master/trainer/train.py:40, in train(opt, show_number, amp) 38 opt.select_data = opt.select_data.split('-') 39 opt.batch_ratio = opt.batch_ratio.split('-') ---> 40 train_dataset = Batch_Balanced_Dataset(opt) 42 log = open(f'./saved_models/{opt.experiment_name}/log_dataset.txt', 'a', encoding="utf8") 43 AlignCollate_valid = AlignCollate(imgH=opt.imgH, imgW=opt.imgW, keep_ratio_with_pad=opt.PAD, contrast_adjust=opt.contrast_adjust)

File ~/Anasarshad/EasyOCR-master/trainer/dataset.py:77, in Batch_Balanced_Dataset.init(self, opt) 74 batch_size_list.append(str(_batch_size)) 75 Total_batch_size += _batch_size ---> 77 _data_loader = torch.utils.data.DataLoader( 78 _dataset, batch_size=_batch_size, 79 shuffle=True, 80 num_workers=int(opt.workers), #prefetch_factor=2,persistent_workers=True, 81 collate_fn=_AlignCollate, pin_memory=True) 82 self.data_loader_list.append(_data_loader) 83 self.dataloader_iter_list.append(iter(_data_loader))

File ~/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:344, in DataLoader.init(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers, pin_memory_device) 342 else: # map-style 343 if shuffle: --> 344 sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] 345 else: 346 sampler = SequentialSampler(dataset) # type: ignore[arg-type]

File ~/.local/lib/python3.10/site-packages/torch/utils/data/sampler.py:107, in RandomSampler.init(self, data_source, replacement, num_samples, generator) 103 raise TypeError("replacement should be a boolean value, but got " 104 "replacement={}".format(self.replacement)) 106 if not isinstance(self.num_samples, int) or self.num_samples <= 0: --> 107 raise ValueError("num_samples should be a positive integer " 108 "value, but got num_samples={}".format(self.num_samples))

ValueError: num_samples should be a positive integer value, but got num_samples=0