Open arcctgx opened 4 years ago
Why the hell does Facebook allow control characters on posts, they don't strip this? This is crazy. It's probably OK to strip these kind of characters for all feeds.
As a workaround Liferea users can create a conversion filter to remove these control characters until this issue is fully addressed.
#!/bin/bash
# Remove lower-ASCII control characters: NUL-ACK and SO-US.
# Familiar control sequences \[abtnvfr] (BEL-CR) are not changed.
tr --delete '\0-\6\16-\37'
Save this to a file and make it executable. Then tell Liferea to use this filter for affected feed (right-click feed, select "Properties", go to "Source" tab, check "Use conversion filter", "Select file"). This might not be effective until you restart Liferea.
This is a rare edge case but I think we can strip away these characters. There might be some rss xml rules regarding escaping these?
Describe the bug My RSS reader (Liferea-1.12.7) is reporting a parse error for Atom feed created by RSS-Bridge from Facebook profile https://www.facebook.com/mglaofficial/. Mozilla Firefox also reports parsing error.
This is the only RSS feed generated by RSS-Bridge I'm having problems with. This particular feed was working well before the update on April 7th was posted.
To Reproduce Steps to reproduce the behavior:
Expected behavior XML document tree is displayed. No parse errors are reported.
Additional context I downloaded the XML generated by RSS-Bridge and ran it through
xmllint
with the following result:The problem is that there are ASCII control characters
0x03
(^C
, ETX) embedded in the content of the April 7th post, right after words "London", "Belfast", etc. They seem to cause the XML parse errors. After manually removing the 4 occurences of this character, neitherxmllint
, Liferea nor Firefox complain anymore.While I understand it's difficult to fully sanitize any arbitrary input, maybe something could be done to handle the lower-ASCII control sequences?