Open unriccio opened 7 years ago
@unriccio Thanks for reporting. Will look into the encoding of ampersands issue. Cheers.
I'm adding your original ticket / issue over here for easier reference:
I got some kind of "XML Parsing Error: not well-formed" using my favourite feedreader. I see the same error also when opening the atom/rss feeds with firefox and chromium.
It seems it's because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links and guid. Example follows: (sorry it's a spam entry but I think the issue still applies)
Original feed:
riccio@hactar:/tmp$ wget -q http://www.openstreetmap.org/diary/rss -O - | fgrep Beer | egrep "(link|guid)" <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</link> <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</guid>
Aggregator output:
riccio@hactar:/tmp$ wget -q https://blogs.openstreetmap.org/rss20.xml -O - | fgrep Beer | egrep "(link|guid)" <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</guid> <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</link>
Note: So %20&%20
get changed to %20&%20
in the guid and link tag - is this correct? Why does it break the xml parsing? Needs to get checked.
Yep, correct.
The ampersand is used to escape entities (as indeed shown by "&"), so the parser will try to interpret "&%20" as if it was a proper entity.
Sorry for the long wait. I finally got around to check in detail. The error is in the feed templates (in the openstreetmap) repo that are missing xml escapes (CGI::escape_HTML
) for guid and link that turns "unescaped" &
back into escaped &
. I will try to send in a pull request later today and than close this ticket. Again thanks for reporting the error. Keep it up. Cheers. Prosit 2020!
Hi, I got an "XML Parsing Error: not well-formed" using the pluto-generated feeds available on http://blogs.openstreetmap.org/ because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links/guids.
I don't know the details of that instance (version/release, environment, etc), but one of the developers suggested it should be a pluto issue. Could you please check? Test case and details on gravitystorm/blogs.osm.org#28 (I checked past/closed issues about this but I couldn't find any)
Thanks.