Quoted-printable values ignore charset (always UTF-8)

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Set a Note-value to something with newlines and special chars
2. Write vCard with VCardVersion.V2_1

What is the expected output?
The complete result is in ISO 8859-1 including the quoted-printable parts after 
they are decoded

What is the actual output?
The result is in ISO 8859-1 except for the quoted-printable parts which after 
decoding turn out to be in UTF-8 

What version of ez-vcard are you using?
0.9.0

What version of Java are you using?
1.6

Please provide any additional information below.
This also happens if I explicitly set the charset of the properties to ISO 
8859-1

Original issue reported on code.google.com by tom_vo...@gmx.de on 27 Nov 2013 at 3:03

GoogleCodeExporter commented 9 years ago

Hello,

Thanks for your input.  I'm not sure if this is a bug though.  It makes sense 
that you could get a UTF-8 string after decoding a quoted-printable string.  
The purpose of quoted-printable is to encode characters which cannot be encoded 
in the current character set.

Original comment by mike.angstadt on 4 Dec 2013 at 3:20

GoogleCodeExporter commented 9 years ago

Hi, 

thanks for the reply. 
But if I set the charset on the type to iso 8859-1 then I would expect the 
resulting string after decoding to be of charset iso 8859-1 and not utf-8.

Original comment by tom_vo...@gmx.de on 4 Dec 2013 at 3:46

GoogleCodeExporter commented 9 years ago

What is the exact string you are using in the CHARSET parameter value?  It 
looks like there must be a hyphen between "ISO" and "8859-1", instead of a 
space.  If there is a space, Java will not recognize the charset, which causes 
ez-vcard to decode it using UTF-8.

Original comment by mike.angstadt on 4 Dec 2013 at 4:07

GoogleCodeExporter commented 9 years ago

"ISO-8859-1"

Original comment by tom_vo...@gmx.de on 4 Dec 2013 at 4:24

GoogleCodeExporter commented 9 years ago

Can you check to see if there are any parser warnings?  ez-vcard will add a 
parser warning if there is problem decoding a quoted-printable value.

To do that with the VCardReader class, call the getWarnings() method.  To do 
that with the Ezvcard class, pass an empty list into the "warnings()" method, 
then print the list after parsing the vCard.

Original comment by mike.angstadt on 4 Dec 2013 at 4:30

GoogleCodeExporter commented 9 years ago

No warnings concerning quoted-printable.
Here's what I do:

  Note noteType = new Note(person.getComment());
  noteType.getParameters().setCharset(charset);
  vcard.addNote(noteType);
  ...
  StringWriter writer = new StringWriter();
  VCardWriter vCardWriter = new VCardWriter(writer, VCardVersion.V2_1, null, "\r\n");
  log.debug(vcard.validate(VCardVersion.V2_1));
  Ezvcard.write(vcard).version(VCardVersion.V2_1).go(writer);

Note contains:
"test
äöüß
test"

Result is:
NOTE;CHARSET=ISO-8859-1;ENCODING=quoted-printable:test=0A=C3=A4=C3=B6=C3=BC=
 =C3=9F=0Atest

Decoded:
testÃ¤Ã¶Ã¼ Ãtest

:(

Original comment by tom_vo...@gmx.de on 9 Dec 2013 at 5:24

GoogleCodeExporter commented 9 years ago

Ah, I see.  Ok, fixed it.  Thanks :D

Original comment by mike.angstadt on 12 Dec 2013 at 3:49

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

I thought about this some more.  My first solution didn't solve the root of the 
problem, which is that the character encoding of the ***Writer*** object should 
be used by default when encoding a quoted-printable value.  You shouldn't need 
to manually set the CHARSET parameter.

The fix I've just committed will use the Writer object's character encoding if 
no CHARSET parameter is provided.  If it can't determine the Writer's character 
encoding, it will use your system's default character encoding.  If a CHARSET 
parameter is set, then it will use that character encoding instead of the 
Writer's.

Attached is the patched JAR.

Original comment by mike.angstadt on 13 Dec 2013 at 5:29

Attachments:

ez-vcard-0.9.1-SNAPSHOT.jar

GoogleCodeExporter commented 9 years ago

Hi,

great, thanks!

Original comment by tom_vo...@gmx.de on 16 Dec 2013 at 9:06

NdheerajNagar / ez-vcard

Quoted-printable values ignore charset (always UTF-8) #10