Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.
https://netnewswire.com/
MIT License
8.45k stars 535 forks source link

Handle (e.g. delete) RSS messages by blacklisting Unicode character sets in multilingual feeds #3082

Open JayBrown opened 3 years ago

JayBrown commented 3 years ago

I have a couple of RSS feeds that are multilingual, but there's no way to adjust the feeds to English-only or German-only etc. at the source, so in this scenario it would need to be dealt with in NNW. I assume that intelligent language detection would be too complex, so my idea is simpler and broader, namely a way to handle (e.g. hide, move to a smart feed folder or outright delete) RSS messages, whose titles contain certain character sets, based on the standard Unicode blocks, i.e. Latin, Cyrillic, Greek & Coptic, Arabic, Hebrew, CJK etc., including Extended and Extended-Additional sets.

The user would then have an option (tick boxes) in preferences to blacklist any of the NNW-supported character sets, and also choose a way (dropdown menu) how to handle blacklisted messages, when they come in as part of a feed, e.g. delete or hide or move messages. It would have to be a very strict approach to avoid false positives, i.e. only the title of an RSS message would need to be parsed, and that title (the whole string) would have to contain only one character set to fall under any blacklist rule, so a title string with a mix of English (Latin) & Chinese (CJK) would pass through, while Latin-only (if you don't read English etc.) or CJK-only (if you don't read Chinese/Japanese/Korean) would then be handled by NNW, if the former or latter character set has been blacklisted.

PS: special characters like . ; : / § ' " ( ) [ ] - and other dashes, Arabic numbers (0 & 1–9) etc. would have to be ignored, of course.

JayBrown commented 3 years ago

In order to not decrease the speed of RSS feed refreshes, it would probably be appropriate to implement this not as an automatic function during refresh, but as a manual command, e.g. under File > Clean Up Blacklisted Articles.