andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Question/Help: how to render an isolated bibliography entry #153

Closed suhail-singh closed 6 months ago

suhail-singh commented 8 months ago

Thank you for creating this library!

For my personal blog (which uses some customisations in addition to ox-html and ox-publish), I would like to create and use an Org mode macro which given a bibtex file (as set by #+bibliography), a CSL file (as set by #+cite_export), and a bibtex reference, generates the HTML corresponding to the bibliographic entry. Specifically, I am interested in being able to generate the HTML corresponding to div.csl-entry > div.csl-right-inline (i.e., the bibliographic entry without the bibliographic numbering).

Additionally, I would also like to modify the generated HTML in some specific ways. For instance, I would like to add target="_blank" to the links so that any URLs in the rendered bibliographic entry (corresponding to the URL field of the bibtex entry) open in a new tab.

One option would be to use something like org-export-string-as on a minimal string (containing the citation) as below.

(org-export-string-as
   "#+bibliography: \"refs.bib\"
#+cite_export: csl \"./acm-sig-proceedings.csl\"
[cite:@iandola2016squeezenet]
#+print_bibliography:"
   'html t)

Having done the above, one can then post-process the generated HTML (in my case, first to extract the HTML corresponding to the DOM element of interest and then to tweak the HTML to add target="_blank") as needed.

Is there a better way? Ideally, no post-processing would be needed.

andras-simonyi commented 6 months ago

Hello, thanks for your interest in citeproc-el and apologies for replying that late; unfortunately, in the last few weeks I have been extremely busy. As for rendering a bibliography entry, have you tried simply using the bibentry citation style for that? (The shorthand is cite/b.) Regarding modifying the generated html, the easiest way would be to modify the html citeproc formatter, e.g., link formatting is specified by the

   (href . ,(lambda (x y) (concat "<a href=\"" y "\">" x "</a>"))) 

cons cell in the citeproc-fmt--html-alist in citeproc-formatters.el. These alists are currently constants, but I can make them variables if that would be useful.

suhail-singh commented 6 months ago

apologies for replying that late

All good.

have you tried simply using the bibentry citation style for that? (The shorthand is cite/b.)

Thank you, I hadn't. I wasn't aware of cite/b, and using it helps somewhat, but there are still issues that I hope you're able to provide some guidance with.

Observed behavior

Specifically, executing the below:

(require 'citeproc-formatters)
(require 'org)

(let ((dir
       "path/to/directory/")
      (citeproc-fmt--html-alist (cons
                                 (cons 'href
                                       (lambda (x y)
                                         (concat "<a target='_blank' href=\""
                                                 y "\">" x "</a>")))
                                 citeproc-fmt--html-alist)))
  (org-export-string-as
   (concat "#+bibliography: \"" dir "refs.bib\"
  #+cite_export: csl \"" dir "acm-sig-proceedings.csl\"
  [cite/b:@iandola2016squeezenet]
  ")
   'html t))

Results in:

"<p>
[<div class=\"csl-right-inline\">Iandola, F.N. et al. 2016. <a href=\"https://arxiv.org/abs/1602.07360\">Squeezenet: Alexnet-level accuracy with 50x fewer parameters and&#60; 0.5 mb model size</a>. <i>Arxiv preprint arxiv:1602.07360</i>. (2016).</div>
  ]
</p>
"

I.e., the target='_blank'isn't present in the generated output.

For context, the contents of refs.bib are:

@article{iandola2016squeezenet,
  title={SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size},
  author={Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt},
  journal={arXiv preprint arXiv:1602.07360},
  url={https://arxiv.org/abs/1602.07360},
  year={2016}
}

Desired behavior

For reference, what I'm trying to generate (given the above refs.bib) is something like:

"<span>Iandola, F.N. et al. 2016. <a href=\"https://arxiv.org/abs/1602.07360\">Squeezenet: Alexnet-level accuracy with 50x fewer parameters and&#60; 0.5 mb model size</a>. <i>Arxiv preprint arxiv:1602.07360</i>. (2016).</span>
"

If I understand correctly, it seems, I should be able to get something like the above by overriding the entries for both 'href and 'display-right-inline (provided I am able to do so in a way that "sticks").

The way I am generating the HTML snippet above via org-export-string-as has a couple of issues:

  1. It seems let-binding citeproc-fmt--html-alist isn't sufficient. Perhaps it's because invoking org-export-string-as resets its value? What would be the recommended way to override citeproc-fmt--html-alist?

  2. Additionally, if the above mechanism (generate HTML string snippet from citeproc-el via org-export-string-as) is used then for each citation entry, I believe the bibtex and styles file will need to be re-read. What would be a way to avoid this unnecessary work? Perhaps some way that initializes citeproc-el in the desired way (with the appropriate refs.bib file and citation style, and overriding the citeproc-fmt--html-alist), and then uses a function from the API to more directly generate the HTML snippet corresponding to the csl-right-inline element. Thoughts?

andras-simonyi commented 6 months ago

I think overriding citeproc-fmt--html-alist by a let binding doesn't work because it is a constant and citeproc-el is using lexical bindings. If citeproc-fmt--html-alist was defined with defvar instead of defconst than it would be a dynamically bound variable, this is why I suggested this change above.

As for the problem of isolated rendering, perhaps the simplest way would be using the method described in the "Rendering isolated references" section of the citeproc-el README, that way you can create a parsed style and a locale getter, and use them for rendering several items. The items themselves can come from an appropriate item-getter, see the function citeproc-hash-itemgetter-from-any in ˙citeproc-itemgetters.el`.

andras-simonyi commented 6 months ago

Another thought: if you forgo Org, and use citeproc-render-item as I suggested, then you can define a new citeproc formatter by mostly copying the html formatter and making the modifications you need, register your formatter in citeproc-fmt--formatters-alist, which is already a variable, and use it (its identifier) as the format parameter of citeproc-render-item.

suhail-singh commented 6 months ago

I think overriding citeproc-fmt--html-alist by a let binding doesn't work because it is a constant and citeproc-el is using lexical bindings.

I don't believe that that's the reason. defconst in Elisp can be let-bound. For instance, when you evaluate the below:

(require 'citeproc-formatters)

(let ((citeproc-fmt--html-alist (cons
                                 (cons 'href
                                       (lambda (x y)
                                         (concat "<a target='_blank' href=\""
                                                 y "\">" x "</a>")))
                                 citeproc-fmt--html-alist)))
  (alist-get 'href citeproc-fmt--html-alist))

It results in:

(closure (t) (x y) (concat "<a target='_blank' href=\"" y "\">" x "</a>"))

I believe the reason is something else (or perhaps, something in addition). It's probably because citeproc-fmt--html-alist isn't used directly, but via citeproc-fmt--formatters-alist and in the latter the value set for :rt is a function that uses the unmodified value. I.e., what needs to be overridden isn't citeproc-fmt--html-alist, but rather the :rt property in the entry corresponding to html in citeproc-fmt--formatters-alist. Okay, I believe I understand this part.

However, when I tried to use citeproc-render-item directly, I wasn't successful in my attempt. It's not clear how the LOC-GETTER argument is to be obtained for citeproc-create-style, nor how to get the item from the item-getter created by citeproc-hash-itemgetter-from-any. For instance evaluating the below:

(funcall
 (citeproc-hash-itemgetter-from-any
  "path/to/refs.bib")
 "iandola2016squeezenet")

Results in

Debugger entered--Lisp error: (error "Unsupported citeproc itemgetter retrieval method")

Could you please share (or provide a reference to) a code snippet that uses citeproc-render-item and is roughly equivalent to the below org-export-string-as invocation? Seeing a minimal, but complete example of how to use citeproc-create-style together with citeproc-render-item would be quite helpful.

(org-export-string-as
   "#+bibliography: \"refs.bib\"
#+cite_export: csl \"./acm-sig-proceedings.csl\"
[cite/b:@iandola2016squeezenet]
"
   'html t)
andras-simonyi commented 6 months ago

I don't believe that that's the reason. defconst in Elisp can be let-bound.

You are right, I thought constants are not bound dynamically when defined using lexical binding but, apparently, that isn't the case. Anyhow, here is an example demonstrating what you could do using only citeproc-el:

(require 'citeproc)

;; Create and register a 'modified-html' citeproc-el formatter 

(setq modified-html-alist
      (cons `(href . ,(lambda (x y) (concat "<a target=\'_blank\' href=\"" y "\">" x "</a>")))
        citeproc-fmt--html-alist))

(push `(modified-html . ,(citeproc-formatter-create
            :rt (citeproc-formatter-fun-create modified-html-alist)
            :bib #'citeproc-fmt--html-bib-formatter))
      citeproc-fmt--formatters-alist)

;; Use the created formatter to render some items' bib entries

(let* ((lg (citeproc-locale-getter-from-dir "/dir/to/csl_locales"))
       (style (citeproc-create-style "/path/to/style_to_use.csl" lg))
       (ig (citeproc-hash-itemgetter-from-any "/path/to/bibtex.bib"))
       (items (funcall ig (list "bibtex_key_1" "bibtex_key_2"))))
  (mapcar (lambda (item) (citeproc-render-item (cdr item) style 'bib 'modified-html))
      items))

Changes: Use let* instead of let as suggested by the next comment by @suhail-singh.

suhail-singh commented 6 months ago

Thank you for sharing that code example! The item-getter needed a list of items instead of the string I was passing.

On a related note, I discovered that Org mode provides a locale file. As such the following can serve as a decent default for the locale-getter:

(citeproc-locale-getter-from-dir (file-name-concat
                                  (file-name-directory (find-library-name "org"))
                                  "etc" "csl"))

Given that the citeproc codebase doesn't have a use-site of citeproc-render-item, I believe it would help for the code snippet to be included in the documentation. If you agree and do decide to do so, please note that the let in your code example needs to be a let*.

andras-simonyi commented 6 months ago

Thanks for the suggestion and the correction (I updated the code), I'll probably add the snippet to the wiki and link to it from the README.

andras-simonyi commented 6 months ago

I plan to close this issue shortly if there are no objections.

suhail-singh commented 6 months ago

No objections.