Charset problem in the result page

PiRSquared17 / daisydiff

Automatically exported from code.google.com/p/daisydiff

0 stars 0 forks source link

Charset problem in the result page #3

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. I compare http://www.baidu.cn/ and itself.(Nothing change)
2. But the result page messed up.(I mean strange characters, it's a charset
problem)

What is the expected output? What do you see instead?
I wana Chinese display well, but it messed up.

What version of the product are you using? On what operating system?
1.0
My OS is Windows xp Chinese language.(So the default charset is gb2312)

Please provide any additional information below.
Thank you for the good tool:)

Original issue reported on code.google.com by renyan...@gmail.com on 22 Dec 2008 at 2:09

Attachments:

b3.html

GoogleCodeExporter commented 9 years ago

I'll classify this as a feature request for support for other languages.

Original comment by guy...@gmail.com on 22 Dec 2008 at 10:04

Added labels: Type-Enhancement, Priority-Low
Removed labels: Type-Defect, Priority-Medium

GoogleCodeExporter commented 9 years ago

DaisyDiff sets nekohtml feature 
"http://cyberneko.org/html/features/scanner/ignore-
specified-charset" to 'true'.

So if you are using DaisyDiff directly via java API, you should specify which 
encoding should be used by setting charset of InputSource:

inputSource.setEncoding("UTF-8");

before calling method cleanAndParse.

At least it helped in my case.

Original comment by hudak.ra...@gmail.com on 21 Feb 2009 at 8:09

GoogleCodeExporter commented 9 years ago

Original comment by guy...@gmail.com on 17 Apr 2009 at 9:58

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Hi..

If u r dealing with a bulk set of urls,u may not apply this setEncoding.
I am facing similar problem that

I am dealing with bulk url source,which has many char sets lile 
(utf-8,windows,iso etc)
And i found that the html source after diff comparison is by default UTF-8.

How can i find a solution to solve the problem when different charset pages r 
diifed.??

Original comment by srssreej...@gmail.com on 27 May 2011 at 1:28

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Hello,

Fixed encoding problem with below code

DaisyDiff.diffTag(new BufferedReader(new InputStreamReader(sourceStream , 
IOUtil.UTF8)),new BufferedReader(new InputStreamReader(targetStream , 
IOUtil.UTF8)),postProcess);

..Thanks

Original comment by ramana.p...@gmail.com on 13 Oct 2011 at 5:26