codebude / QRCoder

A pure C# Open Source QR Code implementation
MIT License
4.66k stars 1.1k forks source link

vCard to QRCoder - diacritics problem #554

Closed VladdyH closed 5 months ago

VladdyH commented 5 months ago

Discussed in https://github.com/codebude/QRCoder/discussions/487

Originally posted by **VladdyH** March 25, 2024 Hi, successfully implemented generating QR code that provides users with vCard information, but only when there are no diacritics. I implemented QRCoder nuget v. 1.4.3 within my .NET Framework 4.8 web application that is set to use utf-8 by proper meta tag: <meta charset="utf-8" /> in the pages. The case: When I try to use czech characters - e.g.: BEGIN:VCARD VERSION:3.0 N:Novák;Janěščřžýáíé; FN:Mgr. & Mgr. Janěščřžýáíé Novák, DiS. ORG:MyCompany TITLE:Mgr. & Mgr. TEL;TYPE=,VOICE:+420123456789 ADR;TYPE=:;; EMAIL:testovaci@adresa.tst END:VCARD Then the information read by QR scanner does not show the ěščř... characters correctly - only shows ??? and another garbage. But when I add "úů" characters to the N... line: BEGIN:VCARD VERSION:3.0 N:Novák;Janěščřžýáíé**úů**; FN:Mgr. & Mgr. Janěščřžýáíé**úů** Novák, DiS. ORG:MyCompany TITLE:Mgr. & Mgr. TEL;TYPE=,VOICE:+420123456789 ADR;TYPE=:;; EMAIL:testovaci@adresa.tst END:VCARD Then the information read by **QR scanner shows every character correctly.** Any elaboration about using another version of vCard (e.g. 2.1) + specifying "N;CHARSET=UTF-8:Novák;Janěščřžýáíéúů;" did not give any effort. My implementation for the code generation is pretty straightforward and works, except the case described above: public byte[] GetQRCode(string vstup, Color barvaKostek) { using (QRCodeGenerator qrGenerator = new QRCodeGenerator()) { using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, QRCodeGenerator.ECCLevel.L)) //ECCLevel = ErrorCorrectionLevel - odolnost proti chybám čtení (L=7%, M=15%, Q=25%, H=30% { using (QRCode qrCode = new QRCode(qrCodeData)) { Bitmap qrCodeImage = qrCode.GetGraphic(2, barvaKostek, Color.White, true); // číselný parametr pro GetGraphic = počet pixelů, které obsadí 1 zobrazený černý bod v grafickém výstupu using (var ms = new MemoryStream()) { qrCodeImage.Save(ms, System.Drawing.Imaging.ImageFormat.Png); return ms.GetBuffer(); } } } } } Am I missing anything that I should set for the QRCoder to work properly even without the "úů" characters?

After I placed text with diacritics to "ORG:" value, all my readers can correctly read all diacritics from QR code.

Shane32 commented 5 months ago

@codebude v1.5.1 is encoding the text as UTF8, since it contains characters which are outside of ISO-8859-1, but not applying the UTF8 ECI encoding mode. My iPhone is able to read the string regardless, however. I would guess that some QR code readers are able to detect UTF8 encoding even though the data should technically be read as ISO-8859-1. Most likely the reader used by @VladdyH was able to detect the UTF8 encoding when the string contained the "úů" characters, but not with only the other characters.

@VladdyH please try this code and see if it fixes your issue, being sure to test it with the same QR code reader that you were previously having the issue with:

using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, QRCodeGenerator.ECCLevel.L, true, false, EciMode.Utf8))

@codebude Assuming this fixes the problem, I suggest that for either v1.6 or v2 we have the ECI mode set properly when encoding text as UTF8, ensuring compliance with the QR Code specification. Note that this may be why some payload generators require a specific ECI mode -- not because the specification requires it, but rather that without specifying the ECI mode, QRCoder fails to include the ECI mode while encoding text as UTF8.

codebude commented 5 months ago

Assuming this fixes the problem, I suggest that for either v1.6 or v2 we have the ECI mode set properly when encoding text as UTF8, ensuring compliance with the QR Code specification.

Sure, we can take another look at the topic. Preferably for v1.6. However, I believe that it is not quite so simple, because in my opinion the original code from @VladdyH could theoretically also work, because GetEncodingFromPlaintext() should also come to the EncodingMode.Byte without forceUtf8 = true. Yes, according to the standard, the UTF8 EciMode should now be set. In practice, however, in the past many QR readers unfortunately did not support ECI, but were able to read UTF-8 in byte mode. (As they probably tried to determine the encoding of the data heuristically). Here we should at least consider a fallback parameter (as well as UTF-8 with ECI.Default instead of ECI.Utf8) in future implementations. (I don't know what the market looks like today - i.e. how many QR readers actually support ECI nowadays).

Note that this may be why some payload generators require a specific ECI mode -- not because the specification requires it, but rather that without specifying the ECI mode, QRCoder fails to include the ECI mode while encoding text as UTF8.

I think you are wrong here. The two payload generators that explicitly set the EciMode (SlovenianUpnQr, SwissQrCode) have either explicitly listed the mandatory EciMode or explicitly stated that data must be encoded as UTF-8 in their specifications. This actually has nothing to do with the QRCoder implementation.

VladdyH commented 5 months ago

@Shane32 I tried the code you suggested, but it did not help. Changing vCard version along with your suggestion didn't change the result either.

So far only the workaround with diacritics in "ORG:" can do the job for my situation. The actual values making the difference in ORG are: "MyCompany" vs. "MyCompany České republiky". I double-checked that the vCard string for "vstup" is UTF-8.

Further elaboration and confusion (without ORG workaround): The result also differs according to other non-diacritic characters. When I use for firstName: "Janěščřžýáíé" - bad diacritics But when use only: "ěščřžý" - omitted "Jan" and "áíé" - results to OK dicritics

My QR reader is an android app: com.xiaomi.scanner, version 13.2204.5 (built-in to Xiaomi Redmi Note 11 Pro+ 5G).

Shane32 commented 5 months ago

How about this:

using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, QRCodeGenerator.ECCLevel.L, true, true, EciMode.Utf8))
codebude commented 5 months ago

@VladdyH @Shane32 This somewhat confirms what I said about the QRCoder readers, where each reader seems to have its own interpretation of encoding and often not following the standard/specification. (Sometimes heuristic, sometimes with ECI, sometimes without, etc.)

My QR reader is an android app: com.xiaomi.scanner, version 13.2204.5 (built-in to Xiaomi Redmi Note 11 Pro+ 5G).

I've heard that the native Xiaomi barcode scanner is notorious for not being able to handle UTF-8 correctly.

As an example of a scanner that cannot display a proper UTF-8 string, take Xiaomi phones with MIUI Global v11.0.3 (with their native scanner application). These phones cannot correctly show a string of Cyrillic characters encoded in UTF-8 even if this charset is specified in the ECI. The Cyrillic characters are shown as question marks. But if you add a Chinese/Japanese character (e.g. 日) to the Cyrillic text, the whole text will be displayed correctly by Xiaomi. This is regardless of BOM. Source: https://stackoverflow.com/a/61035741

Can you please try another QR code scanner app? E.g. https://play.google.com/store/apps/details?id=com.google.zxing.client.android

I expect it to be able to read the QR codes without any problems. (Both with your original code and with Shane's code.)

codebude commented 5 months ago

Did some quick tests. I generated three variants (utf8 without eci, force-utf-8 + eci, force utf-8 + bom + eci);

var vstup = @"BEGIN:VCARD
VERSION:3.0
N:Novák;Janěščřžýáíé;
FN:Mgr. & Mgr. Janěščřžýáíé Novák, DiS.
ORG:MyCompany
TITLE:Mgr. & Mgr.
TEL;TYPE=,VOICE:+420123456789
ADR;TYPE=:;;
EMAIL:testovaci@adresa.tst
END:VCARD";
using (QRCodeGenerator qrGenerator = new QRCodeGenerator())
using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, ECCLevel.L))
using (QRCode qrCode = new QRCode(qrCodeData))
{
    Bitmap qrCodeImage = qrCode.GetGraphic(5);
    qrCodeImage.Save(@"C:\\Users\\netbl\\Downloads\\original.png", System.Drawing.Imaging.ImageFormat.Png);
}

using (QRCodeGenerator qrGenerator = new QRCodeGenerator())
using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, ECCLevel.L, true, false, EciMode.Utf8))
using (QRCode qrCode = new QRCode(qrCodeData))
{
    Bitmap qrCodeImage = qrCode.GetGraphic(5);
    qrCodeImage.Save(@"C:\\Users\\netbl\\Downloads\\original_forceutf8_eci.png", System.Drawing.Imaging.ImageFormat.Png);
}

using (QRCodeGenerator qrGenerator = new QRCodeGenerator())
using (QRCodeData qrCodeData = qrGenerator.CreateQrCode(vstup, ECCLevel.L, true, true, EciMode.Utf8))
using (QRCode qrCode = new QRCode(qrCodeData))
{
    Bitmap qrCodeImage = qrCode.GetGraphic(5);
    qrCodeImage.Save(@"C:\\Users\\netbl\\Downloads\\original_forceutf8_bom_eci.png", System.Drawing.Imaging.ImageFormat.Png);
}

The output looks as follows:

original (utf8 without eci)


original_forceutf8_eci (force-utf-8 + eci)


original_forceutf8_bom_eci (force utf-8 + bom + eci)


I then tried to scan all three codes with different scanners:

/ Xiomi barcode scanner Barcode Scanner (ZXing)
utf-8 without eci diacritics rendered incorrectly scans correct
force-utf-8 + eci diacritics rendered incorrectly scans correct
force utf-8 + bom + eci code not recognized as vCard at all scans correct

Conclusion

I would say the problem at this point is definitely the QR scanner app. Please switch to another app. You will not be able to solve the problem on the sourcecode/QRCoder side.

p.s.: Do you generate the vCard string by hand? (It looks so, because I think it is missing some semicolons.) Have you seen that QRCoder has a vCard payload generator? https://github.com/codebude/QRCoder/wiki/Advanced-usage---Payload-generators#35-contactdata-mecardvcard

VladdyH commented 5 months ago

@codebude I am sorry, I can place just a fast response since I am pretty indisposed nowadays: Yes, I am generating the vCard by hand, tried different vCard versions with same effect. But I also tried the payload generator which did make no difference on readability (same results). I also use another QR reader app: "QR and Barcode Scanner PRO", that always was able to read codes OK.

As soon as I will be able to test your suggestions, I will reply accordingly. Ha, now when I re-read your posts (english is not my native lang.), I guess you already tested what you suggested. Great job :)

codebude commented 5 months ago

Ok, let's summarize. The faulty display was/is due to the fact that the Xiomi scanner cannot handle UTF8 correctly. I would then close the issue here. If there are still questions, please open the issue again.

@Shane32 We can discuss the topic of ECIMode in relation to encoding in #513 or, if desired, in another separate issue.