Open m040601 opened 4 years ago
Good question – I think I just took the readability project that appeared first in my npm search.
And in the meantime I was actually trying out https://www.npmjs.com/package/article-parser – yet still having some issutes with it. Do you know the project?
I understand you recommend using
npm install @mozilla/readability
?
https://www.npmjs.com/package/article-parser Do you know the project?
No never heard about it. Actually I search more in github than npm registry for this kind of things. But what I can tell you from my experience in the last years is that many of these node modules or readability libraries make a big splash at the beginning, but end up totally abandoned and unmaintained after a few months. No matter if it is Node or Go or Python. So I tend to watch for the ones one cant trust for the long run.
I understand you recommend using npm install @mozilla/readability ?
I'm not a developer or programmer. And I wouldnt' touch node even with a pole :-). So dont take my opinion from an expert. I got my information from here:
https://github.com/qutebrowser/qutebrowser/pull/5009 Looks like the readability library is available via npm now:
It seems before you had to do
npm install -g https://github.com/mozilla/readability.git
But now you can do,
npm install -g @mozilla/readability
You can't do
npm install -g readability
Because someone else already took that name "readability".
But I've been collectinng and testing lotz of this kind of "readability" apps in node/python/etc for the last year.
Since the Firefox Reader Mode seems to do a good job and Mozilla has lotz of resources and developers I always "assumed" that the Mozilla node one might be the best one to use. Because I dont understand node, I could never make a simple ready made cli tool out of the mozilla library myself. That's why I looked for others ready made, even though I dont like to install node on my system. And I hate having to pull dozens or hundreds of small node modules as dependencies.
But honestly, after having tested so much of these readability extractors, I think it doest make that big difference at all. It's very dependable on the website. Modern websites are so complicated.. Sometimes even the simplest python script based on the original readability algorithm does the job accetably.
Let me see if I can find more (node tools) in my notes:
Mozilla Readability based:
gardenappl / readability-cli · GitLab This guy seems to pull directly from mozilla the "official" readability https://gitlab.com/gardenappl/readability-cli
NightMachinary/readability-cli: A CLI for Mozilla Readability. Get clean, uncluttered, messes my system because it installs a binary simply called "readability" that conflicts with others https://github.com/NightMachinary/readability-cli
aarmea/readability-scrape: last update 2018, it imports Readability.js, the library used in Firefox's reader view, directly from Mozilla's repository https://github.com/aarmea/readability-scrape
qutebrowser/readability-js at master this is a node the script that qutebrowser uses to get a "Reader Mode" just like Firefox. uses the official Mozilla's readability library (npm install -g @mozilla/readability) https://github.com/qutebrowser/qutebrowser/blob/master/misc/userscripts/readability-js
enrico-kaack/markdown-clipper: very interesting firefox extension, also uses official mozilla readability https://github.com/enrico-kaack/markdown-clipper
pirate/readability-extractor: Wrapper around mozilla/readability to keep archivebox free from nodejs https://github.com/pirate/readability-extractor
danburzo/percollate: A command-line tool to turn web pages into beautiful, readable documents in PDF, EPUB, or HTML format. It's a "big machine" . Pulls almost 400 mega of chromium pupeteer.(fake browser ?) Tries to do everything and the kitchen sink in node. So it's not like yours "on the shoulders' of giants But it seems polished and well maintained. Also seems to use mozilla readabilyty https://github.com/danburzo/percollate
Not Mozilla Readability based:
A small question about the readability dependency. On the README.md page you write:
where "Readability" links to, https://github.com/mozilla/readability
But your project actually uses, https://github.com/luin/readability , which actually installs a module called "node-readability"
I know that "luin" is probablily a fork or something pulling from "mozilla". I just wanted to make sure there is a reason for this, and for not pulling directly from mozilla.
I ask this becaus I've been testing dozens of node based readability projects, and very frequently because they choose to name their binary "readability" you end up with a mess of different packages and/or their installed binary named "readability" .