Closed GoogleCodeExporter closed 8 years ago
(There also seems to be a problem with quotes, see "“simply�".)
Original comment by kaspar.f...@gmail.com
on 7 Jan 2010 at 7:02
Hi Kaspar,
thanks for this report.
The "strange garbage" is actually what you get when incorrectly setting the
input encoding. In this case, the
text was UTF-8, but it was treated as Latin-1.
When calling Extractor.getText(URL), we relied upon NekoHTML to find <META
HTTP-EQUIV="Content-
Type"> tags even when passing a Reader instead of an InputStream. Unfortunately
that didn't work...
I have fixed it in SVN. Could please check out ExtractorBase from trunk and see
if it works for you?
Best,
Christian
Original comment by ckkohl79
on 24 Jan 2010 at 3:43
Hi Christian,
Thanks for fixing this. Works like a charm.
Best,
Kaspar
Original comment by kaspar.f...@gmail.com
on 24 Jan 2010 at 3:53
Original comment by ckkohl79
on 24 Jan 2010 at 4:10
Original issue reported on code.google.com by
kaspar.f...@gmail.com
on 7 Jan 2010 at 6:50