NaturalHistoryMuseum / pyzbar

Read one-dimensional barcodes and QR codes from Python 2 and 3.
MIT License
724 stars 175 forks source link

Error while decoding QR codes with UTF-8 characters #95

Open DrMint opened 3 years ago

DrMint commented 3 years ago

Hi!

I'm currently working on a project that uses QR codes. To make sure the generated QR codes are correct, I thought of using your module to decode them and compare the expected content with the decoded content.

Everything was working fine until I tried using Unicode characters. The decoded content no longer match the expected content. Here is a simple program that showcases this problem:

import pyqrcode
import pyzbar.pyzbar
from PIL import Image

def encodeDecode(content):
    # Generate a QR code image from the content
    url = pyqrcode.create(content, encoding='utf-8')
    url.png('qrcode.png', scale=8)

    # Decode the QR code and retrieve the content
    decodedContent = pyzbar.pyzbar.decode(Image.open('qrcode.png'))[0].data

    # Compare with the original content
    if (decodedContent.decode('utf-8') == content):
        print("TEST OK with", content)
    else:
        print("TEST FAILED with", content)

encodeDecode('\u0100')               # TEST OK
encodeDecode('\u0101')               # TEST OK
encodeDecode('\u2133')               # TEST FAIL
encodeDecode('\u0100\u2133')    # TEST OK
encodeDecode('\u0101\u2133')    # TEST FAIL

And here is the result when executed:

TEST OK with Ā
TEST OK with ā
TEST FAILED with ℳ
TEST OK with Āℳ
TEST FAILED with āℳ

We can see that the character ℳ is not well decoded, but then when appended with Ā, it's working correctly. Even stranger, appended with ā (a character from the same Unicode group), again, the content is incorrectly decoded.

The problem doesn't originate from pyqrcode as all generated QR codes can be decoded correctly using other decoding solution like ZXing for Java, phones, or websites.

guyskk commented 2 years ago

Seems the qrcode value is decoded as ISO-8859-1 in some where, and can convert it back to utf-8.

>>> d=b'\xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd'
>>> d.decode('utf-8').encode('ISO-8859-1').decode('utf-8')
'你好'
PythonJustForFun commented 1 year ago

A very very ugly and simply "workaround" in Python 3, use the cmdlinetool "zbarimg" Version 0.23.90:

cv2.imwrite('/tmp/image.png',image)
os.system('zbarimg -q --raw --nodbus -Sqr.binary /tmp/image.png >/tmp/result.txt') # CMDLine zbarimg decode 
barcodefile = open("/tmp/result.txt", "rb")
barcodeData = barcodefile.read()
barcodeData = barcodeData.decode('UTF-8')
barcodefile.close()

Greets