alphapapa / org-web-tools

View, capture, and archive Web pages in Org-mode
GNU General Public License v3.0
647 stars 33 forks source link

handling same-page relative links, such as footnotes #45

Open mooseyboots opened 3 years ago

mooseyboots commented 3 years ago

hi ap, & as the others have said, thx for this great package.

i was impressed when i converted an academic essay with -read-url-as-org and it rendered all the footnote anchors as org-links, but it turns out they are relative links to nowhere, not to the notes at bottom of page/document. ditto the footnote links back up to the body of the text. i guess it is just a pandoc issue? is there any way they could be further processed somehow?

for me they appear as [[#en41][41]]. while the footnote return links appear as [[#fn1][↩], having been rendered from something like <a id="fn41" class="endnote-link" href="#en41" rel="footnote">41</a>.

my example text: https://monthlyreview.org/2014/07/01/surveillance-capitalism/

i'm not sure if its something that should be supported, but thought i'd mention it in case there is a workaround or if others have the same issue.

thx again.

c1-g commented 3 years ago

The workaround is in #46; it expand [[#en41][41]] into [[https://monthlyreview.org/2014/07/01/surveillance-capitalism/#en41][41]]. It links you to the absolute url in the org buffer that you can open with eww or whatever browser you've set up.

I think a way to achieve the “real” footnotes (i.e. jumping between the links in the org buffer without relying on a browser) would be to disable the bandage solution if #46 were to get merged, that is, don't expand the internal href attributes (e.g. href=#en41) so this will leave [[#en41][41]] as is in the org buffer, then we post-process the org buffer in org-web-tools--url-as-readable-org by re-search-forward with a regexp for this kind of links based on org-link-bracket-re then replace it with an incremental [fn:NUMBER].

mooseyboots commented 2 years ago

i had a go at another kind of relative/internal link, of the format ^{n}, which i'd like to just convert to [fn:NUMBER].

here's my function

(defun org-web-tools--convert-fns-relative ()
  "Convert ^{n} format footnotes in document to org syntax."
  (interactive)
  (save-match-data
    (while (re-search-forward "\\^{\\([[:digit:]]+\\)}" nil t)
      (replace-match "[fn:\1]" nil nil))))

but it doesn't work. the \(...\) grouping in the search isn't printed properly by \1. it prints ^A for \1, ^B if i change \1 to \2. etc.

any regexperts know how to carry bracket groupings over through re-search-forward to replace-match?

meanwhile incrementing and formatting a number in the org footnote doesn't work, as we need to re-start the count when we reach the targets at the bottom of the page ([fn:1] needs to point to [fn:1]).

it works interactively using query-replace-regexp, with input \^{\([0-9]+\)} → [fn:\1] but i hoped to get it scriptable at least.

from there it would just be a matter of collecting a few common relative footnote forms into a function to run after pandoc.

mooseyboots commented 2 years ago

ah, let-binding the group expression match works:

(defun org-web-tools--convert-fns-relative ()
  "Convert ^{n} format footnotes in document to org syntax."
  (interactive)
  (save-match-data
    (while (re-search-forward "\\^{\\([[:digit:]]+\\)}" nil t)
      (let ((match (match-string 1)))
        (replace-match (format "[fn:%s]" match))))))

and for the footnotes formatted as per my first post:

(defun org-web-tools--convert-fns-relative-alt ()
  "Convert [[#enN]][N]] format footnotes in document to org syntax."
  (interactive)
    (save-match-data
      (while (re-search-forward "\\[\\[#\\(en\\|fn\\)\\([[:digit:]]+\\)\\]\\[[[:digit:]\\|↩]+\\]\\]" nil t)
        ;; NB: 2 here not 1! cd also use (or) and test for first group containing digits
        (let ((match (match-string 2))
              (match-type (match-string 1)))
          (replace-match (format "[fn:%s]" match))
          ;; org-fns must be at bol to work:
          (when (and (equal match-type "fn") ;only for fns in footnotes section
                     (not (bolp)))
            (backward-sexp) ; move point to before org fn's "["
            (kill-line -0)))))) ; kill backward to bol

(the first expression grouping, en|fn, is because anchors in the body are labelled #en while those in the footnote list are labelled #fn).

perhaps there are some other common formats that could be written into a single function, but i have little experience with this sort of thing.

c1-g commented 2 years ago

Another thing worth investigating is the use of custom lua filter for Pandoc. Although, I've never written a filter for pandoc so I'm not familiar with it.

mooseyboots commented 1 year ago

just fyi, i cooked up some functions to have a bit more of a go at this.

i like converting my webpages to latex for printing, and need them footnotes to work for them to print.

https://codeberg.org/martianh/org-web-tools-fn

(WIP)