farshadmohajeri / extpascal

Automatically exported from code.google.com/p/extpascal
49 stars 28 forks source link

Non-ASCII characters rendered wrong #66

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
# What steps will reproduce the problem?

with TExtWindow.Create do begin
  Title := 'à';
  Show;
end;

# What is the expected output?

A window with title 'à'

# What do you see instead?

A window with title '�'

Am I misconfiguring anything?

# What version of the product are you using? On what operating system?

- ExtPascal: v.0.9.8
- ExtJS: v.3.2.1
- Compiler: Delphi XE
- OSes: Win7/64
- WebServer: Apache 2
- Mode: FCGI

Original issue reported on code.google.com by nando.dessena on 31 Aug 2011 at 3:53

GoogleCodeExporter commented 9 years ago
I'll add that with Charset = 'iso-8859-1' the problem does not appear (but then 
you cannot use any characters outside of that, such as the € symbol).

It looks like ExtPascal is serving ansi text instead of utf-8.

Original comment by nando.dessena on 31 Aug 2011 at 5:30

GoogleCodeExporter commented 9 years ago
More info: I have just found that 'iso-8859-1' only works in non-async requests 
anyway. If the code above runs during an async request, the title is still 
'�'.

Also, if I write plain ExtJS code in a separate .js file *and* I save it with 
the right encoding and charset (be it utf-8 or iso-8859-1) and serve it, I see 
the correct 'à' title.

In short, it seems that ExtPascal is not setting the charset in async 
responses. I am attaching a patch that fixes this last problem, so I can move a 
little forward using iso-8859-1. But the utf-8 problem is still present as it 
most probably lies elsewhere.

The patch makes sure that ExtPascal sets the full response content-type 
(including charset) for both async (text/javascript) and non-async (text/html) 
responses.

Comments welcome.

Original comment by nando.dessena on 2 Sep 2011 at 8:37

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you Nando,

I will handle this issue this weekend.

Original comment by wanderla...@gmail.com on 2 Sep 2011 at 12:19

GoogleCodeExporter commented 9 years ago
I think I have it nailed down. Please see the attached patch, which includes 
the previous one. Highlights:

- I have only tested it on DXE but it should work with non-unicode Delphi and 
FP - at least same as before.
- It currently only supports utf-8 (which is really what I was after) and for 
everything else it encodes using the system codepage. It could be extended to 
support other encodings (especially in unicode Delphi this is quite easy to 
do), although I don't see the the point as the WWW has definately gone utf-8 
nowadays.

The patch is based on moving Charset up the hierarchy and using it to decide 
how to encode responses.

Original comment by nando.dessena on 9 Sep 2011 at 2:19

Attachments:

GoogleCodeExporter commented 9 years ago
Addendum: I have tried with Lazarus/FP on Windows, and my patch does not work 
there. On the other hand, I also tested the previous version and it didn't 
work, so I still think the patch is somewhat of an enhancement (it works well 
on unicode Delphis).

I cannot test non-unicode Delphis nor FP on Linux ATM.

Original comment by nando.dessena on 12 Sep 2011 at 7:10