alphapapa / org-web-tools

View, capture, and archive Web pages in Org-mode
GNU General Public License v3.0
647 stars 33 forks source link

Add option to disable use of `eww-readability` in `org-web-tools-read-url-as-org` #55

Open hubisan opened 1 year ago

hubisan commented 1 year ago

I am taking advantage of org-web-tools-read-url-as-org to retrieve word definitions and synonyms from web pages as org buffers. This is truly convenient as I get the content in org syntax and can use no matter what site I want without having to rely on a package.

Unfortunately content is missing for some urls. Examples:

Tracked this down to the line (eww-score-readability dom) in org-web-tools--eww-readable. If I remove that line from the function it inserts all content. eww-score-readability is rather cryptic so not sure how to solve this issue.

alphapapa commented 1 year ago

Hello,

Yes, like any "readability"-type code, eww-readable is liable to not work satisfactorily on all Web sites. But without it, org-web-tools-read-url-as-org would usually include a lot of non-content material, so it seems necessary.

Being Lisp, of course, you may customize the code to use the readability functions conditionally.

hubisan commented 1 year ago

In that case I use some custom code, you can close this issue.

(cl-letf (((symbol-function 'eww-score-readability) #'ignore))
  (call-interactively #'org-web-tools-read-url-as-org))

And thank you for your ongoing efforts to improve our lives inside Emacs, very appreciated :thumbsup:

alphapapa commented 1 year ago

Yes, that looks like a good workaround.

Having said that, I wouldn't be opposed to a user option that would let non-Lisp programmers disable the use of the readability function. So I'll leave this issue open to track that idea.

Thanks for the kind words. I'm glad they're useful to you. :)