ckampfe / russ

A TUI RSS reader with vim-like controls and a local-first, offline-first focus
GNU Affero General Public License v3.0
158 stars 18 forks source link

HTML text extraction #28

Open mntn-xyz opened 7 months ago

mntn-xyz commented 7 months ago

Is there an existing issue for this?

Feature description

Some RSS feeds only include a small snippet of the article, or sometimes nothing at all. I've used other RSS readers that automatically extract the text of articles, usually on a feed-by-feed basis. It would be great to see this in russ as it really helps for offline use.

I'd suggest rust-html2text as it's built in Rust, is actively developed, and it is built on Servo which is under active development again.

mntn-xyz commented 7 months ago

Upon further reflection, this could probably just be done as a scripted post-processing step, which leads me to wonder if (as an alternative) russ could just include a way to run a post-processing command for a given feed. I will open a different issue for that.

ckampfe commented 3 months ago

@mntn-xyz is your use case similar to the work done in this PR? https://github.com/ckampfe/russ/pull/34

mntn-xyz commented 1 week ago

Yes, it looks like this PR would suffice. I still think rust-html2text might be a better option as it offers more configuration, but anything that provides scraping would meet the use case.