andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Singular vs. plural section locators not recognised properly #58

Closed Quintus closed 2 years ago

Quintus commented 2 years ago

As per this and this post on the org-mode ML:


There appears to be a problem with the § locator not properly finding out when singular and when plural labels should be used for the locator:

With »Org mode version 9.5 (release_9.5-104-g2b1fc6 @ /home/quintus/.emacs.d/org-mode/lisp/)«

Das ist ein Test [cite:@saenger2013gsr § 12 Rn. 488].

(which references section 12, margin number 488) gives:

Saenger, Gesellschaftsrecht, 2. Aufl. (2013), §§ 12 Rn. 488

This uses double §§ instead of a single §, that is, it treats the citation as a plural one whereas it should be a singular one. The same happens with

Das ist ein Test [cite:@saenger2013gsr section 12 Rn. 488].

Interestingly, the input

Das ist ein Test [cite:@saenger2013gsr § 12].

(which has no suffix) does correctly export to

Saenger, Gesellschaftsrecht, 2. Aufl. (2013), § 12

with only one § sign. Then, a real multi-section citation like this:

Das ist ein Test [cite:@saenger2013gsr §§ 12 ff.].

instead incorrectly yields a single §:

Saenger, Gesellschaftsrecht, 2. Aufl. (2013), § 12 ff.

This one however:

Das ist ein Test [cite:@saenger2013gsr §§ 12-14].

is correct again:

Saenger, Gesellschaftsrecht, 2. Aufl. (2013), §§ 12-14

-quintus

andras-simonyi commented 2 years ago

Thanks for the report, I think I will investigate first how other CSL processors, in particular Pandoc's citeproc handle this problem.

Quintus commented 2 years ago

Meanwhile, is there a way to forcibly specify the locator like with Pandoc? From the Pandoc manpage:

In complex cases, you can force something to be treated as a locator by enclosing it in curly braces or prevent parsing the suffix as locator by prepending curly braces:

         [@smith{ii, A, D-Z}, with a suffix]
         [@smith, {pp. iv, vi-xi, (xv)-(xvii)} with suffix here]
         [@smith{}, 99 years later]

Something like that would lift the burden of parsing weird judicial locators like § 23 Rn. 74. It should probably just use whatever is inside the forced locator as-is and leave it to the user to use whatever is correct.

bdarcus commented 2 years ago

Meanwhile, is there a way to forcibly specify the locator like with Pandoc?

No; Nicolas has been resistant to allowing that. There was a recent thread on the org list about this.

I'm a little unclear if it would be possible to do it with a forked oc-csl?

Quintus commented 2 years ago

No; Nicolas has been resistant to allowing that. There was a recent thread on the org list about this.

Ah too bad. I must have overlooked it; I will search for it later. But couldn't it be added as a feature specific to this exporter? All that would be required is to parse the start of the suffix passed to citeproc from within citeproc. It would then not be portable towards other exporters, though.

bdarcus commented 2 years ago

But couldn't it be added as a feature specific to this exporter?

I don't believe so.

IUC, the oc processor parses this content and passes it to citeproc as structured data.

andras-simonyi commented 2 years ago

Thanks for the comments! I've been thinking about this a bit and would like to propose the following approach:

  1. Set up citeproc-el's singular/plural locator "classifier" so that only those inputs are classified as plural about which we are very confident, e.g., which contain a dash between two digits, anything else would be classified as singular.
  2. Provide a way to force citeproc-el to override the classification, in particular to enforce that a locator is treated as plural during rendering.
  3. Use the plural locator term variants in oc-csl ("pp." "§§" etc.) in Org cites to enforce the plural treatment.

WDYT? PS. There could even be a setting in oc-csl to always force plural/singular treatment according to the locator term used.

Quintus commented 2 years ago

I think it sounds mostly sensible; it would cover my needs as far as I can see. No. 1 would make all my locator labels currently errorneously recognised as plural into singular, which is what I need for my "chained locators" like § 1 Rn. 5.

One thing that remains problematic is the recognition of f. and ff. as part of the locator. I currently terminate citations with a comma in order to force it to be taken as part of the locator (example: [cite: @eugh2021topsystem note 37 ff.,]). I am fine with that, but I just wanted to make you aware of it. In particular, this means there are no characters between , and ], which might be tricky on parsing.

PS. There could even be a setting in oc-csl to always force plural/singular treatment according to the locator term used.

More options is always good I guess.

andras-simonyi commented 2 years ago

I've merged a PR which hopefully fixes the plural "overgeneration" problem and this particular issue-- the other half of the proposal (signalling plural with appropriate labels) remains to be implemented, mostly on he oc-csl side.

Quintus commented 2 years ago

Thank you! I can confirm that the "overgenerated" plural locators are gone now.