recognizer.training_model.fit raises Unicode decode error

LFBJC commented 2 years ago

Hello everybody, I'm trying to train keras-ocr (actually just the recognizer, the detector works fine for my application) initially U tried on the following alphabet: "ABCDEFGHIJKLMNOPQRSTUVWXYZÇÃÕÉÍÚÁÓÊÔÂabcdefghijklmnopqrstuvwxyzçãõéíúáóêôâ0123456789-/,.", then I tried many variations including lowercase ascii with digits which is just the default alphabet but everytime it raises the following error after one epoch: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 130: invalid continuation byte2021-11-29 20:20:03.052442: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.[[{{node PyFunc}}]]

LFBJC commented 2 years ago

my implementation is based on this tutorial https://www.geeksforgeeks.org/implement-your-own-word2vecskip-gram-model-in-python/ I just removed the detector part and the visualizations and needed to downgrade opencv for it to work due to a bug

LFBJC commented 2 years ago

Here is the stack trace

LFBJC commented 2 years ago

I looked into the files and noticed keras_ocr filters the stack trace then I commented the line that filters the stack trace to see the full stack trace and noticed the error happened when calling a callback tf.keras.callbacks.CSVLogger(f'{recognizer_basepath}.csv', append=True), then another error happened on another callback tf.keras.callbacks.ModelCheckpoint(filepath=f'{recognizer_basepath}.h5') so I removed both callbacks and now my code works, yet I'm leaving the issue opened so that one of the contributors may analyze it because the solution was fine for my code, because it may be caused by some bug either in keras_ocr or in tensorflow.keras.callbacks.CSVLogger, so it may be something they can analyze

faustomorales / keras-ocr

recognizer.training_model.fit raises Unicode decode error #185