JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.78k stars 3.12k forks source link

Keep getting error '_MultiProcessingDataLoaderIter' object has no attribute 'next' #1194

Open JayDbb opened 8 months ago

JayDbb commented 8 months ago

I am trying to fine tune the module for a specific use case and I have been having a plethora of issues that I have ultimatley "by-passed" but this one has been a headdache for a while

I have been working in a CPU runtime on Google Collab

this is the config YAML file `number: '0123456789' symbol: "!\"#$%&'()*+,-./:;<=>?@[\]^_{|}~ €" lang_char: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' experiment_name: 'en_filtered' train_data: 'all_data' valid_data: 'all_data/valid' manualSeed: 1111 workers: 6 batch_size: 32 #32 num_iter: 300000 valInterval: 20000 saved_model: '' #'saved_models/en_filtered/iter_300000.pth' FT: False optim: False # default is Adadelta lr: 1. beta1: 0.9 rho: 0.95 eps: 0.00000001 grad_clip: 5

Data processing

select_data: 'train' # this is dataset folder in train_data batch_ratio: '1' total_data_usage_ratio: 1.0 batch_max_length: 34 imgH: 64 imgW: 600 rgb: False contrast_adjust: False sensitive: True PAD: True contrast_adjust: 0.0 data_filtering_off: False

Model Architecture

Transformation: 'None' FeatureExtraction: 'VGG' SequenceModeling: 'BiLSTM' Prediction: 'CTC' num_fiducial: 20 input_channel: 1 output_channel: 256 hidden_size: 256 decode: 'greedy' new_prediction: False freeze_FeatureFxtraction: False freeze_SequenceModeling: False`

The output that leads to error that I get

Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length

dataset_root: all_data opt.select_data: ['train'] opt.batch_ratio: ['1']

dataset_root: all_data dataset: train all_data/train sub-directory: /train num samples: 53 num total samples of train: 53 x 1.0 (total_data_usage_ratio) = 53 num samples of train per batch: 32 x 1.0 (batch_ratio) = 32 /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 6 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg(

Total_batch_size: 32 = 32

dataset_root: all_data/valid dataset: / all_data/valid/ sub-directory: /. num samples: 54

No Transformation module specified /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 6 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( model input parameters 64 600 20 1 256 256 97 34 None VGG BiLSTM CTC Model: DataParallel( (module): Model( (FeatureExtraction): VGG_FeatureExtractor( (ConvNet): Sequential( (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): ReLU(inplace=True) (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace=True) (8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace=True) (10): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False) (11): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (12): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (16): ReLU(inplace=True) (17): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False) (18): Conv2d(256, 256, kernel_size=(2, 2), stride=(1, 1)) (19): ReLU(inplace=True) ) ) (AdaptiveAvgPool): AdaptiveAvgPool2d(output_size=(None, 1)) (SequenceModeling): Sequential( (0): BidirectionalLSTM( (rnn): LSTM(256, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=256, bias=True) ) (1): BidirectionalLSTM( (rnn): LSTM(256, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=256, bias=True) ) ) (Prediction): Linear(in_features=256, out_features=97, bias=True) ) ) Modules, Parameters module.FeatureExtraction.ConvNet.0.weight 288 module.FeatureExtraction.ConvNet.0.bias 32 module.FeatureExtraction.ConvNet.3.weight 18432 module.FeatureExtraction.ConvNet.3.bias 64 module.FeatureExtraction.ConvNet.6.weight 73728 module.FeatureExtraction.ConvNet.6.bias 128 module.FeatureExtraction.ConvNet.8.weight 147456 module.FeatureExtraction.ConvNet.8.bias 128 module.FeatureExtraction.ConvNet.11.weight 294912 module.FeatureExtraction.ConvNet.12.weight 256 module.FeatureExtraction.ConvNet.12.bias 256 module.FeatureExtraction.ConvNet.14.weight 589824 module.FeatureExtraction.ConvNet.15.weight 256 module.FeatureExtraction.ConvNet.15.bias 256 module.FeatureExtraction.ConvNet.18.weight 262144 module.FeatureExtraction.ConvNet.18.bias 256 module.SequenceModeling.0.rnn.weight_ih_l0 262144 module.SequenceModeling.0.rnn.weight_hh_l0 262144 module.SequenceModeling.0.rnn.bias_ih_l0 1024 module.SequenceModeling.0.rnn.bias_hh_l0 1024 module.SequenceModeling.0.rnn.weight_ih_l0_reverse 262144 module.SequenceModeling.0.rnn.weight_hh_l0_reverse 262144 module.SequenceModeling.0.rnn.bias_ih_l0_reverse 1024 module.SequenceModeling.0.rnn.bias_hh_l0_reverse 1024 module.SequenceModeling.0.linear.weight 131072 module.SequenceModeling.0.linear.bias 256 module.SequenceModeling.1.rnn.weight_ih_l0 262144 module.SequenceModeling.1.rnn.weight_hh_l0 262144 module.SequenceModeling.1.rnn.bias_ih_l0 1024 module.SequenceModeling.1.rnn.bias_hh_l0 1024 module.SequenceModeling.1.rnn.weight_ih_l0_reverse 262144 module.SequenceModeling.1.rnn.weight_hh_l0_reverse 262144 module.SequenceModeling.1.rnn.bias_ih_l0_reverse 1024 module.SequenceModeling.1.rnn.bias_hh_l0_reverse 1024 module.SequenceModeling.1.linear.weight 131072 module.SequenceModeling.1.linear.bias 256 module.Prediction.weight 24832 module.Prediction.bias 97 Total Trainable Params: 3781345 Trainable params num : 3781345 Optimizer: Adadelta ( Parameter Group 0 differentiable: False eps: 1e-08 foreach: None lr: 1.0 maximize: False rho: 0.95 weightdecay: 0 ) ------------ Options ------------- number: 0123456789 symbol: !"#$%&'()*+,-./:;<=>?@[]^{|}~ € lang_char: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz experiment_name: en_filtered train_data: all_data valid_data: all_data/valid manualSeed: 1111 workers: 6 batch_size: 32 num_iter: 300000 valInterval: 20000 saved_model: FT: False optim: False lr: 1.0 beta1: 0.9 rho: 0.95 eps: 1e-08 grad_clip: 5 select_data: ['train'] batch_ratio: ['1'] total_data_usage_ratio: 1.0 batch_max_length: 34 imgH: 64 imgW: 600 rgb: False contrast_adjust: 0.0 sensitive: True PAD: True data_filtering_off: False Transformation: None FeatureExtraction: VGG SequenceModeling: BiLSTM Prediction: CTC num_fiducial: 20 input_channel: 1 output_channel: 256 hidden_size: 256 decode: greedy new_prediction: False freeze_FeatureFxtraction: False freeze_SequenceModeling: False character: 0123456789!"#$%&'()*+,-./:;<=>?@[\]^_{|}~ €ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz num_class: 97

/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py:125: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling. warnings.warn(

AttributeError Traceback (most recent call last) in <cell line: 30>() 28 29 opt = get_config("config_files/en_filtered_config.yaml") ---> 30 train.train(opt, amp=False)

1 frames /content/EasyOCR/trainer/dataset.py in get_batch(self) 99 for i, data_loader_iter in enumerate(self.dataloader_iter_list): 100 try: --> 101 image, text = next(data_loader_iter) 102 balanced_batch_images.append(image) 103 balanced_batch_texts += text

AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'

JayDbb commented 8 months ago

I dropped the number of workers to 2 as it was saying my dataset was too small

riuzz commented 6 months ago

I solved this problem in another way, hoping to help people who encounter this problem in the future: Just change image, text = data_loader_iter.next() to image, text = next(data_loader_iter). It is in get_batch function of Batch_Balanced_Dataset class in dataset.py .