dlenski / python-zxing

python wrapper for the ZXing barcode library
GNU Lesser General Public License v3.0
126 stars 37 forks source link

utf-8 encoding problem, ???????? instead of Thai letters #8

Open skytizens opened 4 years ago

skytizens commented 4 years ago

Hi, I have a problem detecting qrcode which contains Thai characters. Example below:

qr code decoded by zxing wrapper: "ABCC051 ?????????????????????????" qr code decoded by zbar wrapper: "ABCC051 เอกสารประกอบการขอสินเชื่อ"

Could you please recommend me a way to fix this issue?

Best regards and thank you

dlenski commented 4 years ago

Give more information about how you're decoding and encoding.

It's quite likely that the problem is on the encoding side; the encoder may not be correctly signalling the appropriate character set used for the Thai text using ECI.

Because many QR encoders (mis)behave by failing to mark the character set appropriately, ZXing includes logic to try to guess it, but it only knows how to distinguish UTF-8, ShiftJIS, and ISO-8859-1.

skytizens commented 4 years ago

Hello Daniel, Sorry for late answer. It's hard for me to answer your question. I'm just using Python 3, calling your method to get TEXT output from zxing and using zbar at the same time. From what I see on google someone else had a similar problem: https://github.com/zxing/zxing/issues/708

Is it possible to force/set encoding via API?

Best regards, David

dlenski commented 4 years ago

It's hard for me to answer your question. I'm just using Python 3, calling your method to get TEXT output from zxing and using zbar at the same time.

Please show a complete working example of the code including how the barcode is encoded.

From what I see on google someone else had a similar problem: zxing/zxing#708

Yes. And as described in https://github.com/zxing/zxing/issues/708#issuecomment-261771151, the problem is almost certainly with the encoding of the barcode.

In that case, the QR code was encoded in a Chinese character set, but without including an ECI code to inform the decoder of this, which contravenes the standard for how QR codes are supposed to be encoded.

My educated guess is that your case is similar: perhaps your barcode's contents are encoded in ISO-8859-11 (Thai charset) but also without including a corresponding ECI code.

Is it possible to force/set encoding via API?

In the encoder? I have no idea… it depends on what encoder you are using. :man_shrugging: (If the encoder is also ZXing, then yes it is certainly possible to set the correct ECI code.)

If you are asking whether there is a way to band-aid the ZXing decoder so that it uses the intended character set of the barcode, despite the absence of the required ECI code, then yes, you can do this in the Java API by passing a DecodeHintType.CHARACTER_SET to the reader.

There is no way to do this via the Python module currently. Patching the ZXing CLI to make it possible to specify a charset hint would make that possible.

skytizens commented 4 years ago

Hello Daniel, Please find example of the source code and qrcode example below:

import os
import pyzbar.pyzbar as pyzbar
import zxing
from PIL import Image
reader = zxing.BarCodeReader()

def readBarcodeZ(im) : 
    decodedObjects = pyzbar.decode(im)
    allbarcodes = []
    for obj in decodedObjects:
        allbarcodes.append (dict(type=obj.type, data=obj.data.decode('utf-8'), rect=obj.rect))
    return allbarcodes

def readBarcodeX(im) : 
    decodedObject = reader.decode(im)
    return decodedObject

imageFile = "qrcode-example-thai.png"
im = Image.open(imageFile)
im = im.convert('LA')
imBarcodeZ = readBarcodeZ(im)
imBarcodeX = readBarcodeX(imageFile)

print ("Zbar:", imBarcodeZ)
print ("Zxing:", imBarcodeX)

qrcode-example-thai

We use UTF-8 as the code page for encoding Thai characters. It looks like the easiest way is to just add an ECI code when creating qr code.

skytizens commented 4 years ago

Daniel, Just update to my previous comment, I just ran a test on this site: https://zxing.org/w/decode.jspx

and the decoded result looks good. Please find print screen below. 2020-11-16_08h44_08

dlenski commented 4 years ago

Hello Daniel, Please find example of the source code and qrcode example below:

When I run this code, with this example image, it works as intended… that's with 5bc2e3f86c4391575e042011d8fdbbc93164e819, Python 3.6.9 on Linux with locale/charset of en_US.UTF-8, and ZXing java libraries v3.4.1.

Zxing: BarCode(raw='TEST-THAI2-วันนี้คุณเป็นอย่างไรบ้าง?', parsed='TEST-THAI2-วันนี้คุณเป็นอย่างไรบ้าง?', format='QR_CODE', type='TEXT', points=[(231.5, 522.0), (231.5, 308.5), (445.0, 308.5), (423.5, 500.5)])

I'm not sure what you're expecting me to be able to tell you. You haven't given me enough information.

  1. You didn't include the output of your example code. It works for me. Does it not work for you? I have no idea. :man_shrugging:
  2. The code and barcode image you showed are clearly not the same as the ones from for the initial post. I can't read barcodes or interpret source code which I've never seen. :man_shrugging: