andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Callback function for post-processing expanded citations #111

Closed Quintus closed 2 years ago

Quintus commented 2 years ago

Dear Andras,

this issue stems from a question I made on the org-mode mailinglist. The problem is that I want to post-process the expanded citations to apply some modifications that cannot be done by just CSL. The primary example is abbreviating the German name particle “von” to “v.” automatically. I cannot just record the names in the abbreviated form in the bibliographic database, because some publishers do want the particle to be written out. So what I want is to write a function that replaces every occurence of “von” with “v.”. I know this can have some false positives as the word “von” can appear in ordinary sentences as well, but this can be avoided by using other wording. I do proofread my texts normally before I submit them somewhere and I would notice the error.

There are some more cases, like abbreviating “Urteil” automatically to “Urt.” for citations of court decisions. Again, some publishers want it to be written out, others want it abbreviated. My bibliographic database contains the written out version.

Thus I would like to request some kind of callback function that allows such post-processing. Some suggestions for possible constructions:

I would be fine with all of these alternatives, all I need is a reliable way to process exactly the expanded content of a [cite:] construct so that I do not accidentally replace material in the main text. In essence, what I am looking for is the citeproc-el equivalent for org-export-filter-footnote-definition-functions. That filter function somehow appears to work for ODT export (even though it also catches “normal” footnotes in addition to the footnotes created by citeproc-el), but it fails for HTML export, where it only gets passed the reference numbers. (I use footnote-based CSL styles, in case this is not obvious)

-quintus

andras-simonyi commented 2 years ago

This in an interesting request, because I have been actually thinking of adding some extension points in the form of hook variables to citeproc-el for a while. The use case in my mind was making it possible to change the formatting of names, e.g., in academic CVs it is customary to print the name of the person whose CV it is in bold in the author lists of the bibliography. So there could be a hook for transforming names before formatting (acting basically on the CSL name variable content) and another for transforming the already formatted name. (The latter's input would be the internal rich-text representation.) I guess the first hook would be the ideal place to deal with "von"->"v." like transformations.

bdarcus commented 2 years ago

The primary example is abbreviating the German name particle “von” to “v.” automatically.

This is the kind of example, however, that you should probably submit to CSL.

Granted, change there is slow, but depending on the details, it's a reasonable request I think.

https://github.com/citation-style-language/schema/issues

On details, initializing all name particles is easy, for example, and could be handled with a simple attribute to accompany the current one for initialization of given names; something like the below, which we could probably add for the next release:

<names initialize-particle="true">...

If OTOH it might only apply to names from particular locales, or particular names, that adds a lot of complexity, or specificity, that's likely not suitable for CSL itself.

andras-simonyi commented 2 years ago

The primary example is abbreviating the German name particle “von” to “v.” automatically.

This is the kind of example, however, that you should probably submit to CSL.

Yes, I absolutely agree. Emphasizing all occurrences of a certain name, on the other hand, is something which is out of scope for CSL, at least so it seems to me.

bdarcus commented 2 years ago

Yes. Edited comment to clarify.

Quintus commented 2 years ago

This is the kind of example, however, that you should probably submit to CSL.

I have done so at https://github.com/citation-style-language/schema/issues/424, but even that does not cover my full usecase. Let me elaborate. In the OP, I said

There are some more cases, like abbreviating “Urteil” automatically to “Urt.” for citations of court decisions. Again, some publishers want it to be written out, others want it abbreviated. My bibliographic database contains the written out version.

German court decisions are usually either of type “Urteil” or of type “Beschluss” (there are some more obscure types as well, which I leave out here for brevity). I record this in the type field of my Biblatex bibliographic database, which ends up in the genre CSL variable. As I outlined in the OP, it is often required to abbreviate these two words to “Urt.” or “Beschl.” in the citational footnotes. Adding another request to CSL for that appears to be overkill in my opinion.

The problem is more general anyways. By providing a post-processing filter in citeproc-el, it becomes possible to work around possible CSL shortcomings while an ideal solution on the CSL level is still being debated. You see, I need to write my articles now, and while I accept that “change is slow”, I need a solution/workaround for the meantime. Then, I expect always new cases of some kind of post-processing to pop up. For instance, I am contemplating whether I can work around the lack of the feature described in #96 this way. A post-expansion filter callback would thus be a quite useful tool.

So there could be a hook for transforming names before formatting (acting basically on the CSL name variable content) and another for transforming the already formatted name. (The latter's input would be the internal rich-text representation.)

It is not restricted to name fields. As the example I just gave (genre variable) shows, the hook needs to operate on the entire expanded citation, not just on the name.

andras-simonyi commented 2 years ago

Thanks for the explanation. I still think that name particle transformations could be better handled using name-formatting specific hooks, since the risk of false positives would be eliminated but I see the utility of a more general postprocessing hook. From a technical point of view, I could easily add a hook to apply transformations to the final internal rich-text version of citations (as opposed to bibliography items, which are finalized elsewhere in the code). Would this be a good solution for your use-case? (The citeproc-rt library contains several useful functions for manipulating rich-text items.)

Quintus commented 2 years ago

Would this be a good solution for your use-case?

For the moment, yes. I am mostly working with styles currently that do not even have a bibliography -- all bibliographic information is contained in the inline footnotes. Targetting the bibliography for styles that require one should also be easier with org's built-in export filters, although something specific to citeproc-el would be nice. I would consider this an issue separate from the one of this ticket, though.

I still think that name particle transformations could be better handled using name-formatting specific hooks, since the risk of false positives would be eliminated

I agree.

but I see the utility of a more general postprocessing hook.

Just now I came over another usecase. I am working on an entry in a dictionary on Legal Tech, for which the publisher ordered me to cite the journal “Der Staat” not with its ordinary journal title, but as “STAAT” instead because that is how their automatic electronic linking system recognises the journal for the electronic version of the dictionary, so that users can click on the citations. So I need a transformation function that renames citations of “Der Staat” to “STAAT”.

andras-simonyi commented 2 years ago

Actually, now that I'm thinking about this, the rich-text passed to the hook will contain a lot of metadata about which variable is rendered where -- this can also be useful for postprocessing. So expect something along the lines of


'(nil
 (((rendered-var . author))
  (((rendered-names)
    (name-id . 0))
   "Doe, Joe"))
 ", "
 (((rendered-var . title)
   (font-style . "italic"))
  "Magnum Opus")
 " ("
 (((rendered-var . publisher))
  "Oxford University Press")
 ", "
 (((rendered-var . issued))
  "2018")
 ")")
andras-simonyi commented 2 years ago

I've implemented a proof of concept hook in the branch called '111-Add_citation_postprocessing_hook'. For instance, a simple replacement of all occurrences of Urteil with Urt. can be achieved by something like

(defun shorten-urteil (x)
  (citeproc-rt-format x (lambda (s) (string-replace "Urteil" "Urt." s))))

(add-hook 'citeproc-citation-postprocess-functions #'shorten-urteil)

(note that replacements made by this code are limited to contiguous strings in the rich-text citations, not across formatting or variable content boundaries).

Could you perhaps have a look and comment?

andras-simonyi commented 2 years ago

BTW, I'm still not entirely convinced that the transformations you need are best handled at the level of fully formatted citations (the presence of formatting and metadata in general makes these operations rather awkward), it seems to me that you'd be better off with a way to make on-the-fly changes to the content of CSL (or bib(la)tex...) fields.