ahorn / android-rss

Lightweight Android library to parse RSS 2.0 feeds.
529 stars 176 forks source link

"Not well formed XML" error with ISO-8859-1 encoding and accented characters #4

Open zipgenius opened 13 years ago

zipgenius commented 13 years ago

Hello. I'm trying to parse a feed from my website, which is encoded in ISO-8859-1 and uses accented characters (which are very common in Italian language). The parser throw an exception and the application is crashing.

This is the URL of my feed: http://forum.wininizio.it/index.php/rss/blog/

Please, can you help me in order to get it to work?

Thanks for this great piece of code! :)

ahorn commented 13 years ago

Could you please post the error trace that you get. The SAX parser is instantiated in the RSSParser class [1]. I would encourage you to take a peek at the code and content encodings with SAX because I am currently away on an extended trip. Once I am back we can implement the fix. Alternatively, you may submit a "pull request" and I'll merge back your patch.

[1] https://github.com/ahorn/android-rss/blob/master/src/main/java/org/mcsoxford/rss/RSSParser.java

zipgenius commented 13 years ago

WOW! Got it in 5 minutes :)

Here we go with the code. In RSSParser.java, around line #77, we have the following:

private RSSFeed parse(SAXParser parser, InputStream feed) throws SAXException, IOException { if (parser == null) { throw new IllegalArgumentException("RSS parser must not be null."); } else if (feed == null) { throw new IllegalArgumentException("RSS feed must not be null."); }

// SAX automatically detects the correct character encoding from the

stream // See also http://www.w3.org/TR/REC-xml/#sec-guessing final InputSource source = new InputSource(feed); final XMLReader xmlreader = parser.getXMLReader(); final RSSHandler handler = new RSSHandler(config);

xmlreader.setContentHandler(handler);
xmlreader.parse(source);

return handler.feed();

}

I just added the following line before xmlreader.setContentHandler(handler):

source.setEncoding("ISO-8859-1");

et voil: I got my feed working fine :)

Now, let's see how to implement some form of detection of the encoding to force in...

Matteo Riso 2011/5/16 ahorn < reply@reply.github.com>

Could you please post the error trace that you get. The SAX parser is instantiated in the RSSParser class [1]. I would encourage you to take a peek at the code and content encodings with SAX because I am currently away on an extended trip. Once I am back we can implement the fix. Alternatively, you may submit a "pull request" and I'll merge back your patch.

[1] https://github.com/ahorn/android-rss/blob/master/src/main/java/org/mcsoxford/rss/RSSParser.java

Reply to this email directly or view it on GitHub: https://github.com/ahorn/android-rss/issues/4#comment_1173405

ahorn commented 13 years ago

I've written a short unit test and it passes as part of the Maven build. Therefore, this bug may be specific to the version of Android you are using. Could you please clone the repository and add a functional test case (see [1]). Once we can reproduce the error with an automated test we can discuss ways how to fix it.

Cheers, Alex

[1] https://github.com/ahorn/android-rss/blob/master/src/test/java/org/mcsoxford/rss/RSSParserTest.java

joshfriend commented 9 years ago

In case anyone else happens on this issue, I've gotten a similar error when the feed itself specifies ISO-8859-1 as the encoding, but the server sends the data without a Content-Encoding header or one that is set to the wrong value.

ahorn commented 9 years ago

Hi Josh, if you are up to it, perhaps we can start with a unit test that reproduces the problem locally and we can go from there.

dengue8830 commented 7 years ago

i am having the "Not well formed XML" but with a javascript tag, how can i ignore the Githubissues.

  • Githubissues is a development platform for aggregating issues.