ASCII Conversion - Githubissues

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Create a file with accentued characters like à é è ù ö for example with 
ExifTool from Phil Harvey
2. Open the file 
3. The "special" characters are not seen correctly

The problem is due to the way to Encode Strings. You are using ASCII Encoder 
(Encoding.ASCII) in place of more flexible one for example: 
Encoding.GetEncoding("Windows-1252")

Thank you for your work.

Original issue reported on code.google.com by Services...@gmail.com on 26 Feb 2013 at 10:40

GoogleCodeExporter commented 8 years ago

Can you please:

1. Explain how exactly you create the string with ExifTool (the command line)
2. Attach the resulting image file (after zipping)

Thanks,
Ozgur

Original comment by oozcitak on 26 Feb 2013 at 10:56

GoogleCodeExporter commented 8 years ago

I do it with a program written in Vb.Net

I send this commande line

 ExecExifTool.StartInfo.Arguments = -m" & _
                                " -Artist=""" & Auteur & """ -ImageDescription=""" & Sujet & """" & _
                                " -ImageUniqueID=" & IdImage & " -SequenceNumber=" & NumeroSequence.ToString & _
                                CommandeMoment & " " & CommandeGPS & " " & _
                                " """ & NomFichTemp & """"

As you can see, I just put the string without modification on the command line.

If you try to set or read the image information on the exiftool command line 
you have error due to character set of the command line which is not the same 
than in the .net program.

But if you try to view the details of the file in Windows 7 Explorer for 
example, you can see the correct string ("Colonne ùàçè end" in my attached 
file)

I try to modify your code and change the Encoding.ASCII to 
Encoding.GetEncoding("Windows-1252") and it works fine.

Thank you

Claude

Original comment by Services...@gmail.com on 26 Feb 2013 at 12:34

Attachments:

[2012 07 21 15 31 34 - Colonne ùàçè end.zip](https://storage.googleapis.com/google-code-attachments/exiflibrary/issue-41/comment-2/2012 07 21 15 31 34 - Colonne ùàçè end.zip)

GoogleCodeExporter commented 8 years ago

The thing is ImageDescription should (according to the Exif spec) contain 7-bit 
ASCII characters only; referencing ITU-T T.50 IA5 (ITU-T International Alphabet 
No. 5) There is a PDF spec here: http://www.itu.int/rec/T-REC-T.50-198811-S/en

I am not sure how non-ascii characters should be handled. I'll check what 
ExifTool does.

I will look into it.

Thanks,
Ozgur

Original comment by oozcitak on 26 Feb 2013 at 1:37

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

I understand, but I think that the specification is changed by many software. I 
try to view the file with ACDSEE 5.0, LightRoom V4 and Corel Photopaint X6, all 
the software are supporting the extended charset. I also try with Picasa, the 
extended charset is not supported

But the question is WHAT charset to use ? I choose the Windows-1252 because my 
computer is running a French Windows and the default charset is Windows-1252. 
But if the windows is chinese or russian or arabic, what happens ? 
What is the right charset ?

Because there is no specification, the charset used might be the default one on 
the computer. And if it doesn't match with the encoding charset, the text is 
not viewable on the computer.

Thanks,

Claude

Original comment by Services...@gmail.com on 26 Feb 2013 at 2:07

GoogleCodeExporter commented 8 years ago

I believe the sanest approach would be passing the encoding to the constructor 
as an optional parameter. It would have a default value of 
System.Text.Encoding.Default so that it would default to 1252 on your computer 
and 1254 on mine, solving the issue you mentioned. 

ExifTool does that with the -charset flag. 
(http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html#Q18)

And I could add some code to sniff text fields (e.g. if all charcodes < 128 use 
7-bit ASCII, otherwise switch to user supplied encoding) Although I am not sure 
if this is necessary since it appears that the first 128 chars are always the 
same 7-bit ASCII chars for all codepages; according to this: 
http://msdn.microsoft.com/en-us/library/windows/desktop/dd317752(v=vs.85).aspx

Thanks,
Ozgur

Original comment by oozcitak on 26 Feb 2013 at 3:27

GoogleCodeExporter commented 8 years ago

Windows overcomes this by providing its own set of Unicode Exif Tags. See 
XPTitle, XPComment, etc. here: 
http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/EXIF.html

Original comment by oozcitak on 26 Feb 2013 at 3:45

GoogleCodeExporter commented 8 years ago

For Exiftool charset, I do many tries to use it but they don't solve all my 
problems, so I let the charset to the default one and it works fine with my 
software.

For the XP specific tags, you're right, the best practice seems to use them. 
I do some test under Linux and the strings provided by Windows Explorer in the 
XP... tag are correctly viewed.
I don't know when they are created, but I don't remember they are existing when 
I began my project, very long time ago...

The ability to pass the encoder to the constructor, as optionnal parameter, 
seems to be a good way to choose how the strings are encoded. But I'm not sure 
it is necessary to test if there is characters above 127 because the 127 first 
characters are always the same.

In any case thank you very much for your attention.

Claude

Original comment by Services...@gmail.com on 27 Feb 2013 at 9:10

GoogleCodeExporter commented 8 years ago

This issue was closed by revision r99.

Original comment by oozcitak on 1 Mar 2013 at 10:36

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Just a small think, I can see in your code you test 2 times 
ExifTag.WindowsTitle tag in the Add method of the ExifPropertyCollection (line 
58)

Original comment by Services...@gmail.com on 17 Mar 2013 at 6:24

goldengel / exiflibrary

ASCII Conversion #41