facebookresearch / MultiplexedOCR

Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR
Other
79 stars 10 forks source link

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) #16

Closed leezx337 closed 1 year ago

leezx337 commented 1 year ago

Hello all, when trying to peform OCR on this image ![Chinese_2](https://github.com/facebookresearch/MultiplexedOCR/assets/134577768/837f8053-f96c-4ebe-942c-cf3fcb1d6694, I encountered the UnicodeEncodeError. Printing out the results from the text_inference.py at 3 positions in the render_box_multi_text function: word_result_list, in the for loop 'for word_result in word_result_list' and after 'word = f"{word_result.seq_word} [{int(round(score_det*100))}%,{word_result.language}"', i obtained the following output and error:

[WordResult(word='CHINESE', lang=la (5, 1.000), head=la (5, 1.000), det=0.999, seq=0.999), WordResult(word='NEW', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.997), WordResult(word='photos', lang=la (5, 1.000), head=la (5, 1.000), det=0.725, seq=0.925), WordResult(word='HAPPY', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.995), WordResult(word='YEAR', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.999), WordResult(word='iphetos', lang=la (5, 1.000), head=la (5, 1.000), det=0.966, seq=0.760), WordResult(word='泰意发财', lang=zh (6, 0.982), head=zh (6, 0.982), det=1.000, seq=0.601), WordResult(word='205', lang=la (5, 1.000), head=la (5, 1.000), det=0.064, seq=0.842), WordResult(word='OS', lang=la (5, 0.999), head=la (5, 0.999), det=0.025, seq=0.614), WordResult(word='新年快务', lang=zh (6, 1.000), head=zh (6, 1.000), det=1.000, seq=0.705), WordResult(word='SSSS', lang=la (5, 1.000), head=la (5, 1.000), det=0.029, seq=0.764)]
[Info] fonts initiated.
wr: WordResult(word='CHINESE', lang=la (5, 1.000), head=la (5, 1.000), det=0.999, seq=0.999)
WORD: CHINESE [100%,la
la | la
CHINESE [100%,la]
wr: WordResult(word='NEW', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.997)
WORD: NEW [100%,la
la | la
NEW [100%,la]
wr: WordResult(word='photos', lang=la (5, 1.000), head=la (5, 1.000), det=0.725, seq=0.925)
WORD: photos [73%,la
la | la
photos [73%,la]
wr: WordResult(word='HAPPY', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.995)
WORD: HAPPY [100%,la
la | la
HAPPY [100%,la]
wr: WordResult(word='YEAR', lang=la (5, 1.000), head=la (5, 1.000), det=1.000, seq=0.999)
WORD: YEAR [100%,la
la | la
YEAR [100%,la]
wr: WordResult(word='iphetos', lang=la (5, 1.000), head=la (5, 1.000), det=0.966, seq=0.760)
WORD: iphetos [97%,la
la | la
iphetos [97%,la]
wr: WordResult(word='泰意发财', lang=zh (6, 0.982), head=zh (6, 0.982), det=1.000, seq=0.601)
WORD: 泰意发财 [100%,zh
zh | zh
泰意发财 [100%,zh]
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
Cell In[11], line 1
----> 1 predict_on_image(img_dir)

Cell In[10], line 43, in predict_on_image(image_path)
     40 result_logs = result_logs_dict["result_logs"]
     42 img_vis = img.copy()
---> 43 render_box_multi_text(
     44     cfg=cfg,
     45     image=img_vis,
     46     result_logs_dict=result_logs_dict,
     47     resize_ratio=resize_ratio,
     48 )
     50 return img_vis

File ~/MultiplexedOCR/multiplexer/engine/text_inference.py:1165, in render_box_multi_text(cfg, image, result_logs_dict, resize_ratio, det_thresh)
   1163 draw_loc = (min_x, min_y - font_size)
   1164 print(word)
-> 1165 text_size = draw.textsize(word, font=font)
   1166 rec_pad = 0
   1167 rec_min = (draw_loc[0] - rec_pad, draw_loc[1] - rec_pad)

File ~/.conda/envs/multiplexer/lib/python3.8/site-packages/PIL/ImageDraw.py:677, in ImageDraw.textsize(self, text, font, spacing, direction, features, language, stroke_width)
    675 with warnings.catch_warnings():
    676     warnings.filterwarnings("ignore", category=DeprecationWarning)
--> 677     return font.getsize(
    678         text,
    679         direction,
    680         features,
    681         language,
    682         stroke_width,
    683     )

File ~/.conda/envs/multiplexer/lib/python3.8/site-packages/PIL/ImageFont.py:152, in ImageFont.getsize(self, text, *args, **kwargs)
    138 """
    139 .. deprecated:: 9.2.0
    140 
   (...)
    149 :return: (width, height)
    150 """
    151 deprecate("getsize", 10, "getbbox or getlength")
--> 152 return self.font.getsize(text)

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

May I know why i'm unable to perform OCR on this image and why some words are not picked up evident in the fact that the outputs for the word_result_list and the for loop for word_result in word_result_list: differ. Thank you!!

strmojo commented 1 year ago

@leezx337 I am running the same model on English text, but it is only showing the detection, the recognized text is empty. I am using provided weights and demo.yaml as config. Can you please help me, I am not sure what I am missing here?

jefflink commented 1 year ago

@strmojo You need to change the following in the demo.yaml file to /path/to/charmap/public/v3/ that is located on your machine https://github.com/facebookresearch/MultiplexedOCR/blob/1b4d931ffd566bc7b400769abbafa35bfb600b94/configs/demo.yaml#L1-L2

jefflink commented 1 year ago

@leezx337 you will need to replace the hardcoded path of the Arial-Unicode-Regular.ttf in the following line reference below to your own location https://github.com/facebookresearch/MultiplexedOCR/blob/1b4d931ffd566bc7b400769abbafa35bfb600b94/multiplexer/engine/text_inference.py#L1103

If you don't have it, you can get it from https://github.com/stamen/toner-carto/blob/master/fonts/Arial-Unicode-Regular.ttf

Hope it works!

leezx337 commented 1 year ago

Thank you so much for your replies!! I'll try them out :)) @jefflink

leezx337 commented 1 year ago

@leezx337 I am running the same model on English text, but it is only showing the detection, the recognized text is empty. I am using provided weights and demo.yaml as config. Can you please help me, I am not sure what I am missing here?

Hello!! Mmm tbh i'm not very experienced with this haha.... could it be that you didn't change the directory of the demo.yaml file as mentioned by jefflink above, and as a result , the module is unable to perform encoding of the words?

strmojo commented 1 year ago

Thank you! Stupid that I am, I changed the charmap path to "/checkpoint/jinghuang/multiplexer/charmap/" becuase /public/v3/ was hidden in my vscode menu

leezx337 commented 1 year ago

Thank you! Stupid that I am, I changed the charmap path to "/checkpoint/jinghuang/multiplexer/charmap/" becuase /public/v3/ was hidden in my vscode menu

No problem hahah glad it works now!!

leezx337 commented 1 year ago

Thank you sooo much for your assistance @jefflink !! The error has been resolved