Rongronggg9 / RSS-to-Telegram-Bot

A Telegram RSS bot that cares about your reading experience
https://t.me/RSStT_Bot
GNU Affero General Public License v3.0
1.5k stars 275 forks source link

On the same site, different recognition of encoding #391

Open butaford opened 9 months ago

butaford commented 9 months ago

Hello. RSSTT is installed in Docker. The latest versions do not display the Cyrillic alphabet correctly. On version 2.4 I have this: pic-20231224-184840

On 2.2 like this: pic-20231224-184812 Links for check: Displays correctly in all versions http://iptvin.ru/component/jcomments/?task=rss&object_id=1000853&object_group=com_content&tmpl=component Displays correctly no higher than version 2.2 http://iptvin.ru/component/jcomments/?task=rss&object_id=1000707&object_group=com_content&tmpl=component

Sorry for bad English. Best Regards

Rongronggg9 commented 9 months ago

The later feed seems to contain invalid characters in UTF-8, which makes feedparser fall back to other encodings. Theoretically, it is either an upstream issue or a website fault, but I will try to work around it before it gets fixed upstream.

Before v2.3, feeds were decoded by aiohttp before passing to feedparser.

butaford commented 9 months ago

Thank you. Thank you. Thank you 😘 Works correctly with your edits: https://github.com/Rongronggg9/feedparser/tree/fix/encoding-confidence

image image