Kungsgeten / hypothesis

Import data from hypothes.is into Emacs
MIT License
54 stars 14 forks source link

Some suggestions of enhancement #8

Open MBunel opened 3 years ago

MBunel commented 3 years ago

Hello and thanks for your work. I've been using your package for a few days to generate an "org" file containing my hypothes.is annotations. Like bepolymathe (see this issue) I put the file generated by hypothesis.el in my org-roam directory which allows me to link my annotations to my notes and vice versa.

After a couple of days of use I have some suggestions for enhancement (sorry for my poor English). I would have made a pull request, but I don't master elisp (I'm still going to put the code of my hacks).

Allow configuration of the file's header

To be able to use an org file with org-roam (or deft which I also use) it is better to have some extra information in the file header, like #+TITLE: which allows deft and a org-roam to name the file by its title (If this information does not exist deft replaces it with the file name, for org-roam I don't know) or #ROAM_TAGS: which allows org-roam to filter the files by tags.

To allow the insertion of a personalised header I have created a new variable, hypothesis-file-header, whose content is inserted at the creation of the hypothesis-archive file.

(defvar hypothesis-file-header "#+TITLE: Hypothesis\n#+ROAM_TAGS: Quotes Hypothesis\n" "File header")

For this I created a new function hypothesis-update, inspired by the hypothesis-to-archive function:

;;;###autoload
(defun hypothesis-update ()
  (interactive)
  (let ((last-update (hypothesis-last-archive-update)))
    (hypothesis-request
     (lambda (sites)
       (find-file hypothesis-archive)
       (goto-char (point-max))
       ;; Write or update header
       (if (eq (point-max) 17)
           (save-excursion
             (goto-char (point-min))
             (kill-whole-line)
             (insert (format "%s" hypothesis-file-header))
             (insert "#+LAST_UPDATE:" hypothesis--last-update "\n")
             (insert "\n")
             )
         (save-excursion
           (re-search-backward "^#\\+LAST_UPDATE:\\(.*\\)")
           (kill-whole-line)
           (insert "#+LAST_UPDATE:" hypothesis--last-update "\n")
           )
         )
       ;; Write content
       (goto-char (point-max))
       (if (seq-empty-p sites)
           (message "Nothing new since last import.")
         (save-excursion
           (org-insert-time-stamp (current-time) nil t "* Imported on " "\n\n")
           (let ((hypothesis--site-level 2))
             (mapc #'my/hypothesis-insert-site-data sites)
             )
           )
         )
       (save-buffer)
       )
     (append (when last-update `(("search_after" . ,last-update))) `(("limit" . 200) ("order" . "asc"))))))

As I don't know anything about elisp it's a bit confusing, sorry. Basically the function adds the header if the file is empty and updates #+LAST_UPDATE: if the file already exists, then the content is added.

I had to do a bit of an ugly hack to make it work. Indeed, to check that the file is newly created I use (eq (point-max) 17) and not (eq (point-max) 1) which would be more logical. The problem is that the function hypothesis-last-archive-update inserts the string #+LAST_UPDATE: at the top of the file as soon as it is created and I haven't managed to change this behaviour, so if the file contains 17 characters (i.e. #+LAST_UPDATE:) I empty it and insert the header, which is not satisfying, but it works for my personal use.

Add a hierarchical level in the file

Currently all the annotations of a single site (if imported at the same time) are grouped together in a level 2 title. I think it's a pity that each of them doesn't have its own title. So I created a function my/hypothesis-insert-site-data (Adapted from the function hypothesis-insert-site-data) which allows to obtain an archive file like this :

* Import date
** Website annoted
*** Annotation 1
*** Annotation 2

Add some informations in the file

There is some information transmitted by the hypothesis api that I would like to have in the archive file. So I built the my/hypothesis-insert-site-data function to add this information and I modified the hypothesis-data function to retrieve more information.

Here is my version of hypothesis-data :

(defun hypothesis-data (row)
  "Parse data from ROW into an alist."
  (let (
        (id (alist-get 'id row))
        (text (alist-get 'text row))
        (highlight (hypothesis--selector-key "TextQuoteSelector" 'exact row))
        (location-start (hypothesis--selector-key "TextPositionSelector" 'start row))
        (location-end (hypothesis--selector-key "TextPositionSelector" 'end row))
        )
    `((id . ,id)
      (uri . ,(alist-get 'uri row))
      (title . ,(elt (alist-get 'title (alist-get 'document row)) 0))
      (text . ,text)
      (highlight . ,highlight)
      (creation-time . ,(hypothesis-parse-iso-date (alist-get 'created row)))
      (update-time . ,(hypothesis-parse-iso-date (alist-get 'updated row)))
      (user . ,(alist-get 'user row))
      (tags . ,(alist-get 'tags row))
      (group . ,(alist-get 'group row))
      (location-start . ,location-start)
      (location-end . ,location-end)
      (incontext-link . ,(alist-get 'incontext (alist-get 'links row)))
      (type . ,(cond
                ((string-empty-p text) 'highlight)
                (highlight 'annotation)
                (t 'page-note))))))

I don't use all the information that the hypothesis-data function returns, but I think it would be interesting to exploit it.

Here is my my/hypothesis-insert-site-data function:

(defun my/hypothesis-insert-site-data (site)
  "Insert the data from SITE as `org-mode' text."
  (insert (format "%s [[%s][%s]]\n\n"
                  (make-string hypothesis--site-level ?*)
                  (car site)
                  (alist-get 'title (cadr site))))
  (dolist (x (sort (cdr site)
                   (lambda (row1 row2)
                     (< (or (alist-get 'location-start row1) 0)
                        (or (alist-get 'location-start row2) 0)))))
    (insert (format "%s [[%s][%s]]\n"
                    (make-string (+ hypothesis--site-level 1) ?*)
                    (alist-get 'incontext-link x)
                    (alist-get 'title (cadr site))
                    )
            )
    (org-set-property "Hypothesis_ID" (alist-get 'id x "error"))
    (org-set-property "Hypothesis_user" (alist-get 'user x "error"))
    (org-set-property "Hypothesis_group" (alist-get 'group x "error"))

    (org-insert-time-stamp (alist-get 'update-time x) t t nil "\n")
    (when-let ((highlight (alist-get 'highlight x)))
      (insert (format "%s\n%s\n%s"
                      hypothesis-quote-prefix
              highlight
                      hypothesis-quote-sufix)))
    (when (eq 'annotation (alist-get 'type x))
      (insert "\n\n- "))
    (insert (concat (alist-get 'text x) "\n\n\n"))))

There are some modifications to the initial hypothesis-insert-site-data function.

Firstly, I am adding a new hierarchical level, as I said earlier. I said to myself that it was not necessary to allow the parameterisation of this level. So the number of "*" is the value of the parameter hypothesis--site-level plus 1. The title of this new level is a hyperlink whose value is the variable incontext-link. Clicking on the annotation can therefore be found in the context of the page (if the user is logged). On the other hand I did not find a relevant title, I thought of the id of the annotation on hypothesis, but it is not very readable, so I left the title of the page, which makes double with the higher hierarchical level. Maybe a solution would be to compute a local id of the form PageTitle_AnnotationNumber.

Secondly I have added some information in org-mode properties, such as id, group and user. For the moment I don't use them, but I think it's interesting to keep them for more advanced uses (sorting, selection, etc.).

To give an example of the modifications, here is what my archive file looks like :

#+TITLE: my_title
#+ROAM_TAGS: tag1 tag2 tag3
#+LAST_UPDATE:2021-01-06T17:48:00.572906+00:00

* Imported on [timestamp]
** [[WEBPAGE_URL][WEBPAGE_TITLE]]

*** [[ANNOTATION_INCONTEXT_URL][WEBPAGE_TITLE]]
    :PROPERTIES:
    :Hypothesis_ID: ANNOTATION_ID
    :Hypothesis_user: acct:USER@hypothes.is
    :Hypothesis_group: GROUPID
    :END:
[TIMESTAMP]
#+BEGIN_QUOTE
MY QUOTE
#+END_QUOTE

- MY COMMENT

*** [[ANNOTATION_INCONTEXT_URL][WEBPAGE_TITLE]]
    :PROPERTIES:
    :Hypothesis_ID: ANNOTATION_ID
    :Hypothesis_user: acct:USER@hypothes.is
    :Hypothesis_group: GROUPID
    :END:
[TIMESTAMP]
#+BEGIN_QUOTE
MY QUOTE
#+END_QUOTE

- MY COMMENT

Other proposals, not implemented

I also have some other proposals but they are much more complex and I am not able to propose an implementation.

Add tags

The Hypothesis api allows you to export the tags, but I haven't managed to use them, I think it would be interesting to exploit this information, especially since org-mode offers tag management.

Changing the file organisation

Currently the annotations are grouped by date of import. I would love to have them grouped by site, but that would require rewriting the file each time you import.

Allow the parameterization of the writing order

To extend the previous proposal it would be really interesting to propose to the user to choose how the first hierarchical level is constructed, by date of import, by site, by user, by group, etc.

Allow sorting and filtering

It would be really cool to have functions to select only some annotations, for example only a group or the annotations having a particular tag. I don't know if it's possible with org-mode (it seems to me that it is but I don't know) or if it's necessary to build a particular interface as proposed by the pocket-reader.el package.

Fragmenting files

It might be interesting for users with a lot of annotations to be able to create multiples archive files, for example one per group or per site.

Kungsgeten commented 3 years ago

Hi! I'm glad that you find the package useful, and also that it inspired you to start hacking som lisp! You additions seem useful. I have been working on my Emacs package a bit less recently, so it might take some time for me until I look into this.