alphapapa / org-web-tools

View, capture, and archive Web pages in Org-mode
GNU General Public License v3.0
642 stars 34 forks source link

Make org format customizable #8

Open ag91 opened 7 years ago

ag91 commented 7 years ago

Hello,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

About the issue: currently downloading a link results in something like

  • [[link][title]] :website: timestamp ** Article contents

For my use case I do not need the ** Article heading. This is enforced in org-web-tools--url-as-readable-org in this bit here:

...
    (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert converted)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string))))

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user. Something along the lines of:

...
    (format converted))))

(defun format (contents)
  "formats the article contents with title, timestamp, article heading"
  (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert contents)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string)))

Would that make sense? For now I am using a modified version of org-web-tools--url-as-readable-org, but I really would like to not miss any future enhancement of this nice package :) Thanks very much for the time spent in this!

alphapapa commented 7 years ago

Hi there,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

That's very interesting! I wasn't aware of org-feed. That is very interesting. So you use that as a feed reader, like instead of elfeed or something else? I'd never thought of that.

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user.

Yeah, that makes sense. The code that manipulates the contents after insertion can be moved into a function and called with a hook.

Thanks very much for the time spent in this!

Thanks for your feedback! I will try to get to this soon. :)

ag91 commented 7 years ago

Hi, yes, I do. I like to read on my ereader and with a bit of set up you can convert the org-feed file into an epub (or whatever you like). I very much appreciate elfeed, but I found easier to hack/extend org-feed with what I needed.

Thanks again for the work on this!

P.S:

The bit of my init that does that sets up org-feed (it is hacky -- I changed guid to be the article weblink):

(defun my/org-feed-parse-rss-feed (buffer)
    "Parse BUFFER for RSS feed entries.
     Returns a list of entries, with each entry a property list,
     containing the properties `:guid' and `:item-full-text'."
    (require 'xml)
    (let ((case-fold-search t)
          entries beg end item guid entry)
      (with-current-buffer buffer
        (widen)
        (goto-char (point-min))
        (while (re-search-forward "<item\\>.*?>" nil t)
          (setq beg (point)
                end (and (re-search-forward "</item>" nil t)
                         (match-beginning 0)))
          (setq item (buffer-substring beg end)
                guid (if (string-match "<link\\>.*?>\\(.*?\\)</link>" item) ;; we use the link instead as guid
                         (xml-substitute-special (match-string-no-properties 1 item))))
          (message "%s" (concat "the guid-link is:" guid))
          (setq entry (list :guid guid :item-full-text item))
          (push entry entries)
          (widen)
          (goto-char end))
        (nreverse entries))))
(defun my/org-feed-parse-rss-entry (entry)
  "Parse the `:item-full-text' field for xml tags and create new properties."
  (require 'xml)
  (let ((guid (plist-get entry :guid)))
    (with-temp-buffer
    (insert (plist-get entry :item-full-text))
    (goto-char (point-min))
    (while (re-search-forward "<\\([a-zA-Z]+\\>\\).*?>\\([^\000]*?\\)</\\1>"
                  nil t)
      (setq entry (plist-put entry
                 (intern (concat ":" (match-string 1)))
                 (xml-substitute-special (match-string 2))))
      (setq entry (plist-put entry
                 :guid
                 guid)))
    (goto-char (point-min))
    ))
  entry)
(defun my/org-get-content-html-as-org (url)
    "Returns the contents of URL as org mode without the heading"
    (if (not (string-equal (file-name-extension url) "pdf")) ;; we exclude the download of pdfs because we do not need them
        (condition-case err
            (s-join "\n" (cdr (cdr (s-lines (org-web-tools--url-as-readable-org url)))))
          (error (concat "Org-web-tools failed with: " (error-message-string err))))
      "This was not a html page."))
(defun my/get-feed-content (new)
    "Adds the contents of the article (grabbing the html page and
       converting it to org) in the description of the feed."
    (progn
      (setq new-formatted
            (mapcar
             (lambda (e)
               (progn
                 (setq article-contents
                       (org-get-content-html-as-org (plist-get e :link)))
                 (setq e1 (plist-put e :description article-contents))
                 (org-feed-format-entry e1 my-for-org-feed/tag-template nil)))
             new))
      (org-feed-add-items (point) new-formatted)))

  (setq org-feed-alist
        `(
          ("Hacker News"
           "https://news.ycombinator.com/rss"
           "/tmp/Feeds.org" "Hacker News"
           :parse-feed my/org-feed-parse-rss-feed
           :parse-entry my/org-feed-parse-rss-entry
           :new-handler my/get-feed-content
           ))
alphapapa commented 10 months ago

I don't plan to work on this myself, but if someone else is interested in contributing it, I'll be glad to consider merging it.