andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Support no-date citations #70

Closed rudolf-adamkovic closed 2 years ago

rudolf-adamkovic commented 2 years ago

In APA Style, citing sources with no date BibTeX field should render as $AUTHOR (n.d.) instead of $AUTHOR.

Tested with Pandoc and CSL: https://github.com/citation-style-language/styles/blob/master/apa.csl

andras-simonyi commented 2 years ago

This is actually an interesting case: the relevant part of the style is this (in the else branch of a conditional checking whether there is a filled in issued or at least status field:

<else>
     <group>
          <text term="no date" form="short"/>
          <text variable="year-suffix" prefix="-"/>
     </group>
</else>

according to the standard,

cs:group implicitly acts as a conditional: cs:group and its child elements are suppressed if a) at least one rendering element in cs:group calls a variable (either directly or via a macro), and b) all variables that are called are empty.

on my reading this means that the standard-compliant rendering is to suppress the term when year-suffix is empty, i.e., the item isn't disambiguated. If Pandoc's citeproc renders the no date term even for non-disambiguated items then it diverges from the standard AFAICS. Or is year-suffix somehow a special case? @denismaier, @bdarcus could you comment on this?

bdarcus commented 2 years ago

I believe @bwiernik has coded a lot of the APA style, so perhaps he can weigh.

But at first glance, I agree this doesn't look like a citeproc bug (though does seem like a style bug). Still, surprising if a bug in the style, given how widely it's used.

bwiernik commented 2 years ago

Hmm, it may be that both pandoc and citeproc-js diverge from the spec on this point or treat year-suffix specially, as the style works correctly with both of them. Let me look into it. There doesn't at first glance look like a need for a group here.

bdarcus commented 2 years ago

There doesn't at first glance look like a need for a group here.

What's interesting is the group is not there in the only other place "no date" shows up in the style.

https://github.com/citation-style-language/styles/blob/0710b5153960b5f79a3318c6f2e5b22c3110a5f9/apa.csl#L575-L578

andras-simonyi commented 2 years ago

What's interesting is the group is not there in the only other place "no date" shows up in the style.

I'm also looking at the the other, groupless "no date" occurrence and citeproc-el is actually suppressing it as well -- I've a vague memory that in certain cases the macro element (?) also has this implicit conditional behavior, at least if one looks at some of the test-suite examples. Within the context of citeproc-el the simplest solution would probably be just making an exception for year-suffix, if this doesn't break something else.

bdarcus commented 2 years ago

I guess these tests are relevant?

bwiernik commented 2 years ago

I opened a PR for the style to remove that group

andras-simonyi commented 2 years ago

meanwhile I've found this comment in the citeproc-js source:

if (variable === "year-suffix") {
    // year-suffix always signals that it produces output,
    // even when it doesn't. This permits it to be used with
    // the "no date" term inside a group used exclusively
    // to control formatting.

this kind of settles the issue for me. OTOH, maybe this special status of year-suffix should be mentioned in the standard?

bdarcus commented 2 years ago

meanwhile I've found this comment in the citeproc-js source:

I'd probably need to think about it more, but doesn't that sound a bit hackish?

Or at least I'm not clear on it; what does that last clause mean?

I wonder how citeproc-rs (which is basically a clean rewrite in rust) handles this ...

denismaier commented 2 years ago

Well, let's ask @cormacrelf

denismaier commented 2 years ago

But I agree that's hackish. Could we convert no date into some sort of substitute for a missing year? I mean in a future release.

bdarcus commented 2 years ago

But I agree that's hackish. Could we convert no date into some sort of substitute for a missing year? I mean in a future release.

It looks like that's what cormac is doing.

https://github.com/zotero/citeproc-rs/blob/19f26ddfaaf9eb46d7d075e5e9accea1a494fefd/crates/proc/src/group.rs#L29

Is there something in our spec that is ambiguous, that needs to be changed here?

Or is this just a citeproc-js thing?

My hunch is the latter; that it's an internal detail.

andras-simonyi commented 2 years ago

AFAICS the problem is that the year-suffix variable has a genuinely special rendering status for items with no date: the accompanying term (no date) has to be rendered regardless of whether the variable is empty or not. In other words, the term-variable rendering dependency is exactly the opposite of the typical (maybe all other??) cases.

bdarcus commented 2 years ago

AFAICS the problem is that the year-suffix variable has a genuinely special rendering status for items with no date: the accompanying term (no date) has to be rendered regardless of whether the variable is empty or not. In other words, the term-variable rendering dependency is exactly the opposite of the typical (maybe all other??) cases.

Ah, right. In effect, the value for a nil date is not nil.

Citations can be such a PITA.

bwiernik commented 2 years ago

I think the easiest approach would be to make a "no-date" variable which is empty if issued is present and renders the no-date term otherwise

That would solve the need to special treatment of year-suffix entirely and can be fixed in existing styles by a batch change

andras-simonyi commented 2 years ago

As a temporary solution I've merged a PR hopefully fixing this particular issue by not reporting an empty variable if the variable happens to be the year-suffix. @salutis, could you check? I'll revisit the code if the above suggestion or an alternative gets implemented in the standard and/or in the styles.

cormacrelf commented 2 years ago

Bruce has the right tack on what citeproc-rs is doing (also note UnresolvedMissing comes from mixing with Unresolved, which according to a grep on the codebase is not used for anything else other than year-suffix).

For any implementation, it is important to recognise that the rendering of a cite changes as it goes through disambiguation, and that implicit conditionals are a part of the rendering process. Variables that were initially found to be empty may no longer be, and so implicit conditionals that were once implicitly suppressed may no longer be. If your model of the renderer is a completely pure function of (variables, disambiguation progress) => HTML then this will be trivial. citeproc-rs, on the other hand, uses trees with enough information stored in each node, and a way to delay and later resolve a variable's presence, and a procedure for propagating that resolution upward through any implicit conditionals above it in the tree. But the spec need not bless a particular way of implementing this.

As has been suggested, the most helpful addition you could make to the spec would be that the implicit conditional part include a mention of year-suffix, as the delayed resolution of its presence or absence is the only reason this happens (in current CSL at least). Something like

As a result of disambiguation, a variable that was initially empty (in particular year-suffix) may no longer be empty. In this case, groups that were initially implicitly suppressed as a result of that variable being empty will no longer be suppressed.

cormacrelf commented 2 years ago

@bwiernik The no-date solution is neither necessary nor sufficient:

The problem statement is incorrect:

AFAICS the problem is that the year-suffix variable has a genuinely special rendering status for items with no date

It has a genuinely special rendering status, with no qualifications on that. Any interaction with missing issued is a coincidence. This is illustrated by a final complication:

It is just that this device is much rarer, and is not really tested AFAIK. Nevertheless, I believe a variable rendered inside such a conditional should technically be able to wake a ghost group from the dead.

bwiernik commented 2 years ago

If citeproc writers are happy to treat year-suffix as always being non-empty, even if it renders nothing, that's fine by me.

cormacrelf commented 2 years ago

I disagree. You need it to work with implicit conditionals. Treating it as always non-empty means that the example from above with the grouped term + year-suffix would need an explicit conditional, which is in my view a breaking change to the implicit conditional behaviour promised by the spec. It looks like it would be a very involved review of the styles in the repo to rectify. For example here it would result in empty renders (should be CSL ERROR etc) showing up as :, without amendment. Further:

<group>
  <text term="circa" /> <!-- please don't render me unnecessarily! -->
  <text macro="date-year" /> <!-- renders one of any number of date variables, or some other term -->
  <text variable="year-suffix" /> <!-- can't feasibly test all those date variables to guard this -->
</group>

That's almost what's going on here

cormacrelf commented 2 years ago

Ah, it goes even further: year-suffix is subject to multiple-use suppression. You cannot reliably know, even by testing issued and every other date variable it's meant to appear next to, whether any particular year-suffix ought to get rendered with unconditional surrounding plaintext. You can put five year-suffixes in a row (e.g. via five different date macros), and only the first one will end up non-empty. The rest should suppress their surrounding plaintext, but if they're always non-empty, then you would get e.g. n.d.-a ... n.d. n.d. n.d. n.d.. No combination of date variable tests can differentiate this.

This would mix especially poorly with e.g. APA-style date macros called called multiple times due to cs:substitute, which would result in suppression of issued, and the second macro call to render the else branch and therefore an extraneous unconditional n.d..

If anything I think year-suffix is the one variable where implicit conditionals are strictly necessary as the only way to conditionally suppress surrounding plaintext output. Every other variable in an implicit conditional can be translated into a (verbose, difficult to maintain) bunch of explicit if statements. (This does raise the question of what it means to test <if variable="year-suffix">, but perhaps that's best considered undefined, it would be far more complicated still.)

bwiernik commented 2 years ago

Cormac I think we are thinking about the issue on weirdly different levels, so let's zoom out.

What is the issue we are trying to resolve?

I thought the problem was the conflict of the general "groups are suppressed if all variables are empty" logic with the expect related to year-suffix.

I offered 2 solutions to this unique exception. Either we introduce a new date variable that can avoid the group suppression logic (which is necessary and sufficient to resolve the issue at hand), or we short circuit the group-related suppression logic for the unique case that is year-suffix, which is efficiently addressed by treating it as always non-empty.

There seems to be some broader issue about variables we are disagreeing on. Can you elaborate about what your concern is?

bdarcus commented 2 years ago

I don't fully understand the Haskell code, but here seems to be where the new Haskell library handles it.

https://github.com/jgm/citeproc/blob/4a7b98afabebd7a074489ba500d68ee6aa75d3a8/src/Citeproc/Eval.hs#L1644

denismaier commented 2 years ago

Has anyone actually checked pandoc's behaviour? I don't understand how etext affects egroup. @jgm?

denismaier commented 2 years ago

Ok, it does work correctly. And i think it's implemented here: https://github.com/jgm/citeproc/blob/4a7b98afabebd7a074489ba500d68ee6aa75d3a8/src/Citeproc/Eval.hs#L1549 year-suffix is not counted as a variable.

rudolf-adamkovic commented 2 years ago

@andras-simonyi

@salutis, could you check?

It works beautifully. Thank you!

cormacrelf commented 2 years ago

Ok, a couple of clarifications:

For a thorny example of why year-suffix always acting non-empty (despite mostly being literally empty) is worse:

<group>
    <text value="prefix-" />
    <date variable="issued" />
    <text variable="year-suffix" />
</group>

You don't want a lone prefix- hanging around in the output. That is possible because year-suffix can obviously end up being empty. The attached text here should be rendered whenever either of issued or year-suffix has a value, but other than that, never. This means people have to very carefully write out their groupings such that this cannot happen.

Whereas with no date acting as a variable, it's impossible to get the attached plain text without the n.d., because terms do not produce empty output on their own (unless the term itself is empty, but one must already account for that). So your output always at least makes sense, and it lines up with the actual usage of no date.

denismaier commented 2 years ago
  • The most relevant special thing about it is that it is often used next to a no date term, and despite no date standing in for what would otherwise be a rendered date variable, terms do not act as variables for implicit conditional groups. I believe that is the core problem here.
  • This points to a much better solution: treat the no date term as a non-empty variable for implicit conditionals.

I think that's pretty much the reasoning behind my suggestion above to convert "no date" into some sort of a substitute for a missing year. I was thinking about something like this:

<date variable="issued" >
  <substitute term="no date"/>
</date>

Here, the term would act as a variable.

So, @cormacrelf's example would look like this:

<group>
    <text value="prefix-" />
    <date variable="issued" >
      <substitute term="no date"/>
    </date>
    <text variable="year-suffix" />
</group>
bwiernik commented 2 years ago

The most relevant special thing about it is that it is often used next to a no date term, and despite no date standing in for what would otherwise be a rendered date variable, terms do not act as variables for implicit conditional groups. I believe that is the core problem here.

Is that the problem though. Isn't suppressing terms when an accompanying variable is not present is one of the core reasons for this feature?

Perhaps my mental model of when terms are used inside groups like this is wrong. What are some other examples where a term should be suppressed or not suppressed when paired with a variable in group?

I'm wondering if this is specific to the no date/year-suffix context.

cormacrelf commented 2 years ago

Read it again, I wasn't referring to the general behaviour that terms have (and should have), I was talking about how it applies to no date even though no date is really a stand-in for a variable. That's the crucial bit, people use no date where they would normally use <date>, and it is bound to end up in duplicated code with a <date> replaced and people are going to expect it to work too. The whole concept of no date and styles that use n.d. is to "fill" an empty date variable and avoid it rendering nothing, especially for an author-date style where the date itself indicates it's a citation and not just a parenthetical name. Avoiding rendering nothing should entail avoiding group suppression. So when it doesn't behave in CSL like a date at all, that is very incongruous and unintuitive, and that's why I think it's the cause of the problem.

There might be other terms like this, and a cursory look through the list gives two others: ibid (standing in for most of a cite) and anonymous (standing in for a name, I think?). But even then, they do not have the problem where they are usually used next to a very-likely-empty variable year-suffix. So yes, it may be specific to no date/year-suffix, but worth considering these two. If you want to put it in the spec, then it's easy to explain why no date and maybe others are special: they are used to render something in spite of an associated variable being empty, and you might want it to behave as if it had rendered a variable. If you are like every style on earth and have have a macro for your dates, it makes even more sense:

<macro name="date">
  <choose><if variable="issued">
    <date variable="issued" ... />
  </if>
  <else>
    <text term="no date" />
  </else></choose>
</macro>

...

<group>
  <text value="first published " />
  <text macro="date" />
</group>

You would really expect this to work as if a variable had been rendered, even in the case of first published n.d.. If people didn't want this missing date variable to render anything, they wouldn't have put no date in there in the first place, because the default else branch is empty.

There is another solution using the following, but as I just outlined, it requires thinking your way around the natural meaning of no date and digging into the CSL spec for an author to accomplish.

<!-- no variables attempted inside the group, so it is never suppressed! -->
<group>
  <text term="no date" />
</group>
<text variable="year-suffix" />

The examples you were looking for:

<group> <!-- absent any variable attempts inside (and through macros/conditionals), group is not suppressed -->
  <text value="verbatim" />
  <text term="reference" /> <!-- terms and values are treated the same way -->
</group>

<group>
  <text value="verbatim" />
  <text variable="MISSING" />
  <text variable="AS MANY MISSING AS YOU LIKE" />
  <text variable="PRESENT" /> <!-- satisfies the group condition,
                                   it attempted many but one succeeded;
                                   the group is not suppressed -->
</group>

<group>
  <text value="verbatim" />
  <text variable="MISSING" /> <!-- causes group to be suppressed as a whole -->
</group>
rudolf-adamkovic commented 2 years ago

I have been citing no-date items for a while now, and it works. Can I close this issue?

andras-simonyi commented 2 years ago

I have been citing no-date items for a while now, and it works. Can I close this issue?

Yes, I think so -- thanks is advance!