JanX2 / cCSVParse

Small Cocoa CSV file parser (see link for the official repository on github).
https://github.com/JanX2/cCSVParse
Other
31 stars 41 forks source link

Parsing UTF-8? #1

Closed clayheaton closed 11 years ago

clayheaton commented 11 years ago

I'm trying to parse a CSV file that contains some UTF-8 characters. Here are a few example strings;

http://commons.wikimedia.org/wiki/File:Şahlûr-33.jpg

Dûrzan cîrano / CC BY-SA 3.0

Christian Mehlführer / CC-BY 2.5

In the case of the last string, it is parsed into the array as: Christian Mehlf\U00c3\U00bchrer / CC-BY 2.5

The UTF-8 hex of ü is C3 BC, present in the string. How do you convert from the imported string to an NSString? I've having trouble with this - it looks like the unicode escaping is incorrect?

clayheaton commented 11 years ago

Nevermind... I just realized that I needed to set _encoding to NSUTF8StringEncoding in the -init method.

JanX2 commented 11 years ago

Maybe I should change the default to UTF-8.

clayheaton commented 11 years ago

Probably a good idea.

JanX2 commented 11 years ago

Done.

BTW: The reason that Latin 1 was the default is that this codebase has quite a few years of history. I strongly recommend using UniversalDetector along with this code!