alphapapa / org-web-tools

View, capture, and archive Web pages in Org-mode
GNU General Public License v3.0
635 stars 33 forks source link

Error when the HTML contains an empty title #20

Closed akirak closed 9 months ago

akirak commented 5 years ago

org-web-tools-insert-link-for-url raises the following error when the HTML of the url contains a title element with no content:

Debugger entered--Lisp error: (wrong-type-argument arrayp nil)
  replace-regexp-in-string("\n" " " nil t t)
  s-replace("\n" " " nil)
  org-web-tools--cleanup-title(nil)

I may fix this issue if I have time, but I'll just file it for now. Actually, the command doesn't make sense if the web page contains no title, so I have no idea what to do with this case.

alphapapa commented 5 years ago

Does the page in question have an empty title (i.e. <title></title>), or no <title> tag at all?

Please eval this function and see if it fixes the problem:

(defun org-web-tools--html-title (html)
  "Return title of HTML page.
Uses the `dom' library."
  ;; Based on `eww-readable'
  ;; TODO: Maybe use regexp instead of parsing whole DOM, should be faster
  (let* ((dom (with-temp-buffer
                (insert html)
                (libxml-parse-html-region (point-min) (point-max))))
         ;; Return empty string if title tag is not found.
         (title (or (cl-caddr (car (dom-by-tag dom 'title))) "")))
    (org-web-tools--cleanup-title title)))
akirak commented 5 years ago

The page contains an empty title element. It's a FedEx's tracking page.

Your solution seems to have fixed the issue. Thanks.