dhoerl / DHlibxls

Framework to read Excel xls spreadsheets
271 stars 86 forks source link

Does it work with accented chars? #2

Closed dfreniche closed 12 years ago

dfreniche commented 12 years ago

Hi. I'm struggling with this and trying to read an Excel file with accented chars (ISO-8859-1 chars), for an iOS App.

The chars are like these: á, é, ñ, etc.

I've tried almost everything changing and tweaking the code. But as far as I understand, internally Excel (and libxls) is using UFT-8, but Cocoa's NSStrings are UTF-16. In the conversion, I always get nil. So when you try to read the contents of a cell and the String as one of these chars, boom! nil

I've been using the last version of your code, pulling libxls correctly from svn, etc. In the log window I can see: CellType: cellString row=2 col=B/2 string: Espana

But if I change the content of the cell to "España", I obtain nil

The test file I've been using is here: http://dl.dropbox.com/u/1012348/test.xls

Can you please help me?

And, BTW: nice wrapper!

dhoerl commented 12 years ago

I will get to this tomorrow or Sunday. For sure the framework and library handle non-ascii characters - I've tested it before. It could be a bug or perhaps you or the framework is not setting some option properly.

Excel will return non-ascii as UTF-16, the same as UIKit and Cocoa use natively in NSString. That was great that you provided a test file too!

dfreniche commented 12 years ago

Great! Thanks a lot for your time and knowledge!

BTW I made a second class method to specify the encoding of the file, but no luck:

+ (DHxlsReader *)xlsReaderFromFile:(NSString *)filePath withEncoding:(NSString *)encoding
{
DHxlsReader         *reader;
xlsWorkBook         *workBook;

// NSLog(@"sizeof FORMULA=%zd LABELSST=%zd", sizeof(FORMULA), sizeof(LABELSST) );
const char *file = [filePath cStringUsingEncoding:NSUTF8StringEncoding];
if((workBook = xls_open(file, encoding))) {
    reader = [DHxlsReader new];
    [reader setWorkBook:workBook];
}
return reader;
}

The original method is now:

+ (DHxlsReader *)xlsReaderFromFile:(NSString *)filePath
{

return [DHxlsReader xlsReaderFromFile:filePath withEncoding:@"UTF-8"];
}
dhoerl commented 12 years ago

Well, this is all very interesting. Supposidly, in BIFF8 (newer Excel), any non-ASCII characters get written in an UTF-16 format in Excel. However, when I look at your file, the data is there in "clear" text with UTF (i.e. non ascii) characters. This is going to take some more investigation. Can you tell me just how you made this .xls file? That is, with what program (Excel ???)

The Excel format doc swears that once the file is BIFF8 (i.e. relatively new) that non-ASCII strings are stored as UTF16, but in your file there are non-ascii chars (for those names) but they are stored as plain strings (i.e. ascii).

That said, of course Excel itself knows how to read it properly!

dfreniche commented 12 years ago

Sorry! I forgot to mention how I made the XLS file. It's a xlsx file (created with Excel 2010 in Windows) and saved in XLS format using "Save as" in LibreOffice. But I've tried saving as using Word 2004 for Mac, with no luck

Maybe the problem is with the conversion?

Options, then?

Thanks a lot!

El 24/03/2012, a las 02:12, David Hoerl reply@reply.github.com escribió:

Well, this is all very interesting. Supposidly, in BIFF8 (newer Excel), any non-ASCII characters get written in an UTF-16 format in Excel. However, when I look at your file, the data is there in "clear" text with UTF (i.e. non ascii) characters. This is going to take some more investigation. Can you tell me just how you made this .xls file? That is, with what program (Excel ???)

The Excel format doc swears that once the file is BIFF8 (i.e. relatively new) that non-ASCII strings are stored as UTF16, but in your file there are non-ascii chars (for those names) but they are stored as plain strings (i.e. ascii).


Reply to this email directly or view it on GitHub: https://github.com/dhoerl/DHlibxls/issues/2#issuecomment-4670717

dhoerl commented 12 years ago

OK - I found the problem. Its a really obscure issue with Excel's UTF encoding that the library just didn't understand. I'm working on a fix for it. I'm sure others have complained about this in the past and we just assumed it was "Operator Error".

dhoerl commented 12 years ago

libxls has been updated to fix the utf problem. Likewise the Framework was slightly updated to comment out logs etc.

It should work perfectly now. If not I'm sure I'll hear from you!

dfreniche commented 12 years ago

Thanks a lot! Can't express my gratitude for working on that on a weekend!

Will check it tomorrow, close the issue and give you a lot of accented XLS files to play with, so you have a good testing bed

Thanks again!

Enviado desde mi iPhone

El 24/03/2012, a las 17:25, David Hoerl reply@reply.github.com escribió:

libxls has been updated to fix the utf problem. Likewise the Framework was slightly updated to comment out logs etc.

It should work perfectly now. If not I'm sure I'll hear from you!


Reply to this email directly or view it on GitHub: https://github.com/dhoerl/DHlibxls/issues/2#issuecomment-4674623

dhoerl commented 12 years ago

On 3/24/12 2:38 PM, Diego Freniche wrote:

Thanks a lot! Can't express my gratitude for working on that on a weekend!

Will check it tomorrow, close the issue and give you a lot of accented XLS files to play with, so you have a good testing bed

Thanks again!

Enviado desde mi iPhone

Well, thanks for the thanks! Cannot work on this at work, so weekends as good a time as any! Don't worry about more files - you test it. The issue is that your strings just used UNICODE code points where were <

  1. So Excel, to save space, saves them as character strings.

The code just assumed any string that was 8-bit was ASCII. It took me a long time to get to this conclusion. Once I figured that out it was all smooth sailing from them on. I even did the UTF-8 conversion in code and didn't use iconv library.

dfreniche commented 12 years ago

Everything worked OK! I'm stocked, my import from XLS into Core Data (sqlite) is working like a charm!

Thanks again for the good work

Closing issue

iappsasiaphoebe commented 10 years ago

I'm trying to parse an xlsx file instead of the test file in the library, but not working. It logged "Not an excel file".

JanX2 commented 10 years ago

This library supports xls only! xlsx is a completely different format.

BTW: Please make a new issue on github unless your problem is directly related to another issue.