cfsimplicity / spreadsheet-cfml

Standalone library for working with spreadsheets and CSV in CFML
MIT License
127 stars 36 forks source link

Special characters in the spreadsheet #132

Closed sebgmc closed 7 years ago

sebgmc commented 7 years ago

Hi Julian,

Thnx 4 making spreadsheet.cfc available for the CFML-public, wonderful set of functions for my spreadsheet needs!

How can I get special characters to play nice with my spreadsheet columns and data?

If I add a ë or ë (the HTML and Unicode version of ë) this gets all botched up in my spreadsheet columns and rows (cells). I assume there is a way to get special characters into a spreadsheet, I just haven't found a right way just yet. I've tried using HTMLEditFormat or HTMLCodeFormat or XMLFormat on any string with a special character in it, as is or properly escaped. But this only renders it in the spreadsheet as the escaped version of the special character.

If I do not escape the special characters they are displayed as weird characters in the spreadsheet and if I escape them in the code and do not use any CF-functions they are rendered as escaped characters in the spreadsheet?

I've also tried the format-functions from the spreadsheet.cfc, but they do me no good either. I generate the data as a query with new Query() and QuerySetCell().

Any idea on or experience with this? In the end I would like special characters to be displayed as ë ur ç or à in the cells.

Thnx up front 4 taking the time 2 read this and hopefully you have a solution or a direction to wander in 4 me ;-)

sebgmc commented 7 years ago

OK, so a quick update on this. I've found a quick and dirty solution (in my mind) where I get the result that I want:

I simply replace all special characters with the Unicode equivalent via the Chr() function of CFML. So if I want an ë (e umlaut) to be shown I say

spreadsheet.setCellValue(workbook,"Cli#Chr(235)#nt",1,1,"string");

or do

data = QueryNew("Cli#Chr(235)#ntnaam","VarChar");

and then

QuerySetCell(data,"Cli#Chr(235)#ntnaam","#somequeryname.somevalue#");

Are there other ways to do this so as not to have to escape the entire Unicode set of characters or me having to write a function that escapes everything per QuerySetCell?

cfsimplicity commented 7 years ago

Hi Sebastiaan

HTML entities are meant for display in a browser. If you put them into a spreadsheet it will normally just treat them as any other text rather than rendering them as a browser would.

Accented and other non standard characters should be passed in "as-is". To illustrate:

s = New Spreadsheet();
wb = s.new();
s.addRow( wb, "ë ur ç or à, ë &##235;" );

produces this: specialchars

Using Chr(N) is another way of passing in the characters as you've found, but ideally your source text should just contain them in their original unescaped form.

cfsimplicity commented 7 years ago

Sorry Sebastiaan, I missed your point that you were getting weird characters without using Chr(). But as you can see from my example that isn't normal and you should be able to insert unicode characters directly.

There must be something about your environment that is preventing this. Are you using Lucee or ColdFusion? What spreadsheet software are you using to open the file with?

cfsimplicity commented 7 years ago

...and are you absolutely sure the query data you are passing in is in unicode/UTF-8?

sebgmc commented 7 years ago

I'm using Lucee 5.x on Windows and opening with LibreOffice. Querydata is from db or inserted directly as text into the CFML-functions. It only gets me desired characters when using Chr(). How can I force UTF-8?

Op 22 sep. 2017 22:45 schreef "Julian Halliwell" notifications@github.com:

...and are you absolutely sure the query data you are passing in is in unicode/UTF-8?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cfsimplicity/lucee-spreadsheet/issues/132#issuecomment-331556196, or mute the thread https://github.com/notifications/unsubscribe-auth/AD94Ml0NKnsyoUXmx54MjRvaJdgD2Yujks5slBxdgaJpZM4Pg9Cg .

cfsimplicity commented 7 years ago

There's no way of "forcing" UTF-8 within the spreadsheet library. You just need to ensure that the values you pass in are in UTF-8 so check where they are being generated.

To create my example script above I used Sublime Text which uses UTF-8 by default. If the text comes from a database then make sure that it is also using UTF-8 and that the datasource connection is too.

Check your Lucee encoding settings. In application.cfc make sure you have

this.charset.web = "UTF-8";
this.charset.resource = "UTF-8";

In the admin check the "Template charset".

Have you tried my example code in your environment (adjusting the spreadsheet.cfc path obviously)? Do you get the same results?

sebgmc commented 7 years ago

Hi Julian, all the stuff you mention is in place, except the Template chartset setting in Lucee. That was set to Windows-1252, which is the default for Lucee CFML templates. No idea why you would want that, but changing that to UTF-8 (which is my assumption anyone would want) the spreadsheet renders special characters no problemo. Creating special characters directly in Sublime or getting text with special characters from the DB now renders as expected in the spreadsheets. Thnx 4 hanging in there with advice and tips!

-- Onlinebase BV Rabobank: NL56RABO0147656230 KvK Den Haag: 28114827 BTW: NL821502360B01

Twitter: @onlinebaseNL http://twitter.com/onlinebaseNL

Internet: http://onlinebase.nl/ http://fusioneers.nl/ http://pixelrocket.nl/

De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren.

Denk aan het milieu voordat u deze e-mail print.

On 23 September 2017 at 12:04, Julian Halliwell notifications@github.com wrote:

There's no way of "forcing" UTF-8 within the spreadsheet library. You just need to ensure that the values you pass in are in UTF-8 so check where they are being generated.

To create my example script above I used Sublime Text which uses UTF-8 by default. If the text comes from a database then make sure that it is also using UTF-8 and that the datasource connection is too.

Check your Lucee encoding settings. In application.cfc make sure you have

this.charset.web = "UTF-8"; this.charset.resource = "UTF-8";

In the admin check the "Template charset".

Have you tried my example code in your environment (adjusting the spreadsheet.cfc path obviously)? Do you get the same results?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cfsimplicity/lucee-spreadsheet/issues/132#issuecomment-331624168, or mute the thread https://github.com/notifications/unsubscribe-auth/AD94MiWv4RfkQOEXNUgpa0YaRZb__2cGks5slNefgaJpZM4Pg9Cg .

cfsimplicity commented 7 years ago

No problem, Sebastiaan. I guess the Windows-1252 default is for ACF compatibility. I always change it to UTF-8 to avoid having to specify the encoding at the top of every template/cfc.