citation-style-language / test-suite

11 stars 12 forks source link

Sorting `citation-number` expressed as “data” versus “display”? #62

Closed Jason-Abbott closed 1 year ago

Jason-Abbott commented 1 year ago

(My purpose is just to implement the clearest, most concise code that passes the tests. I’m happy to forego a conceptual treatise if you have a dozen lines of pseudo-code that do the trick.) 🙂

A variety of tests (sort_BibliographyCitationNumber*, sort_CitationNumber*) yield something like

[4] Xxxx (fourth-cited)
[3] Zzzz (third-cited)
[2] Bbbb (second-cited)
[1] Aaaa (first-cited)

for a

<text variable="citation-number" prefix="[" suffix="] "/>
<text variable="title" />

layout sorted with

<key variable="citation-number" sort="descending" />

Since the specification defines citation-number as the “index (starting at 1) of the cited reference in the bibliography,” this is perplexing. How can the first “reference in the bibliography” be given a number indicating it’s the fourth “reference in the bibliography”?

@fbennett addressed this in an Issue 1 comment, suggesting a distinction between citation-number expressed as “item data” versus a “display variable,” where the latter is the “preexisting sort order” and the former, I guess, the current order.

Since the Swift processor I’m working on performs sorting before rendering (of course), and sorting immediately updates the indexes of references, my rendered citation-number is always what’s current. Maybe I need a reference property that tracks its previous citation-number for rendering but that doesn’t seem quite right either.

[ More specific questions coming … ]

Jason-Abbott commented 1 year ago

I will start with something basic: why does a secondary citation-number sort do anything in tests (simplest example) where all items were sorted by the primary key value?

<key variable="title" sort="ascending"/>
<key variable="citation-number" sort="descending"/>

The specification says, “a secondary sort, using the second sort key, is applied to items sharing the first sort key value.”

In the linked test and the example above, no items share “the first sort key value.” Their titles are all different, thus the secondary citation-number sort shouldn’t apply.

adam3smith commented 1 year ago

I believe this discussion on the Zotero forums is pertinent, but personally I also find the citation-number behavior here perplexing and at a minimum out of spec, arguably counter to it. As a practical matter, I'm pretty sure the above case doesn't come up a single time in the existing styles, which have just two basic sort behaviors related to citation-number: The most common numbering by order of appearance (i.e. sorting by citation-number in citation and with no explicit sort in bibliography) (e.g. vancouver.csl) and sorting by citation-number in text and alphabetical sorting in the bibliography, which means citation numbers appear out of numerical order in the document (e.g., american-medical-association-alphabetical.csl), so if you decide to mark this test as won't-pass in citeproc-swift, performance on all existing styles will remain identical.

Jason-Abbott commented 1 year ago

Setting aside why a secondary citation-number sort applies, I’m also perplexed by how it applies.

The specification says that <sort/> affects the order of “cites within citations, and bibliographic entries within the bibliography.”

But in the examples noted above, the citation-number sort doesn’t do that. The items retain the order they had before the citation-number sort. The only thing apparently sorted is the value of the citation-number itself.

I regret if this sounds critical. I fully understand how complex software evolves over time. 🙂 I’m totally happy to match citeproc-js behavior as soon I can distill it to a minimum number of logical statements.

Edit: I’m also happy to mark tests as won’t-pass, as may be the answer here. I’ll follow that Zotero discussion and get as far as I can. Thanks @adam3smith.

Jason-Abbott commented 1 year ago

It looks like it began here (@fbennett 2011 Zotero comment):

The current processor cannot do this: when sort is performed on the citation-number variable in descending order, it instead reverses the sequence of items, and then numbers them in ascending order, starting with 1. This seems to make little sense, so I have introduced a fix that will instead leave the citations in document order, and number them in descending order (as you require).

I’ll find that change set. I don’t imagine the code is too hard (famous last words, especially with CSL) — just some exceptional handling where citation-number is used. At the least, I’ll update the documentation in the relevant tests and comment here for future reference.

Edit: original change — “When sorting by citation-number with sort="descending", reverse the numbering without changing the order of entries

Jason-Abbott commented 1 year ago

It looks like citeproc-js has completely separate sort handling when the key is citation-number. It’s not used for normal sorting at all.

if (variable === "citation-number") {
  func = function(state, Item) {
    if (state.tmp.area === "bibliography_sort") {
      if (this.strings.sort_direction === CSL.DESCENDING) {
        state.bibliography_sort.opt.citation_number_sort_direction = CSL.DESCENDING;
      } else {
        state.bibliography_sort.opt.citation_number_sort_direction = CSL.ASCENDING;
[ … ]

I think that means I can just ignore

<key variable="citation-number" sort="descending" />

for actual sorting purposes and instead use it to set a flag in state somewhere indicating that citation-number values should be reversed.

If so, then it's conceptually like having

<bibliography displayReversedCitationNumbers="true"/>
Jason-Abbott commented 1 year ago

Also instructive:

CSL.Engine.BibliographySort = function () {
    this.tokens = [];
    this.opt = {};
    this.opt.sort_directions = [];
    this.opt.topdecor = [];
    // Holds the final citation-number sort direction, for use
    // in applying numbers in cs:citation and cs:bibliography.
    // Value is exclusively controlled by cs:key in bibliography_sort
    this.opt.citation_number_sort_direction = CSL.ASCENDING;
    this.opt.citation_number_secondary = false;
    this.tmp = {};
    this.keys = [];
    this.root = "bibliography";
};

I think I can handle this at the XML parse phase — convert any <sort key="citation-number"/> to a state value and remove it so it’s not even seen by the sort or render phases.

Jason-Abbott commented 1 year ago

And here, for completeness, is where citeproc-js calculates reversed citation-numbers based on the state flag:

CSL.Registry.prototype.renumber = function () {
    var len, pos, item;
    // 19. Reset citation numbers on list items
    if (this.state.bibliography_sort.opt.citation_number_sort_direction === CSL.DESCENDING) {
        this.state.bibliography_sort.tmp.citation_number_map = {};
    }
    len = this.reflist.length;
    for (pos = 0; pos < len; pos += 1) {
        item = this.reflist[pos];
        // save the overhead of rerenderings if citation-number is not used in the style.
        item.seq = (pos + 1);
        if (this.state.bibliography_sort.opt.citation_number_sort_direction === CSL.DESCENDING) {
            this.state.bibliography_sort.tmp.citation_number_map[item.seq] = (this.reflist.length - item.seq + 1);
        }
        // update_mode is set to CSL.NUMERIC if citation-number is rendered in citations.
        if (this.state.opt.update_mode === CSL.NUMERIC && item.seq != this.oldseq[item.id]) {
            this.state.tmp.taintedItemIDs[item.id] = true;
        }
        if (item.seq != this.oldseq[item.id]) {
            this.return_data.bibchange = true;
        }
    }
};

Notably this.reflist.length - item.seq + 1

fbennett commented 1 year ago

(Thank you for being sympathetic about the complexity of it all. It means a lot.)

On Mon, Jun 12, 2023, 03:38 Jason Abbott @.***> wrote:

It looks like it began here @.*** https://github.com/fbennett 2011 Zotero comment https://forums.zotero.org/discussion/comment/99175/#Comment_99175):

The current processor cannot do this: when sort is performed on the citation-number variable in descending order, it instead reverses the sequence of items, and then numbers them in ascending order, starting with

  1. This seems to make little sense, so I have introduced a fix that will instead leave the citations in document order, and number them in descending order (as you require).

I’ll find that change set. I don’t imagine the code is too hard (famous last words, especially with CSL) — just some exceptional handling where citation-number is used. At the least, I’ll update the documentation in the relevant tests and comment here for future reference.

— Reply to this email directly, view it on GitHub https://github.com/citation-style-language/test-suite/issues/62#issuecomment-1586277025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSR7D6FMJXGX7FHVJB3XKYGA7ANCNFSM6AAAAAAZCOUS54 . You are receiving this because you were mentioned.Message ID: @.***>

Jason-Abbott commented 1 year ago

It appears there may be another caveat: a <key variable="citation-number" sort="descending"/> in primary position does change item order (e.g. this test) but in secondary or later positions it only causes the numbers to display in reverse order without changing item order — another thing to check.

I feel like I’m getting close to something without too many lines of code that will pass these tests. 🤞 I’ll share the relevant portions of Swift code if I get there.

Jason-Abbott commented 1 year ago

(Thank you for being sympathetic about the complexity of it all. It means a lot.)

@fbennett I was astonished and daunted by all of your commits as I scrolled back to find the first to support reversed citation-numbers 😬. I’ve written code a lot of years but haven't committed that much to a single project. It’s impressive.

Jason-Abbott commented 1 year ago

I think I have all the citation-number sort tests passing with minimal code changes. The logic I applied was

Decoding Phase

With that done there are no more

<macro name="citation-number">
  <text variable="citation-number"/>
</macro>

and no more

<key variable="citation-number"/>

that do magical things. The displayReversedCitationNumbers state value has what’s needed.

Sorting Phase

Rendering Phase

That’s it. I won’t be at all surprised if I run into scenarios needing adjustments but this passes all sort_BibliographyCitationNumber and sort_CitationNumber tests, and hundreds more.

fbennett commented 1 year ago

Well done!

On Mon, Jun 12, 2023, 09:46 Jason Abbott @.***> wrote:

I think I have all the tests passing with minimal code changes. The logic I applied was

Decoding Phase

  • If a sort is present then update a state variable displayReversedCitationNumbers = key.sort == "descending"
    • If that is not the first then remove it so it doesn’t affect the sorting or rendering phases
  • If a sort is present and the macro only contains then update the same state variable, displayReversedCitationNumbers = key.sort == "descending”
    • If that is the first element then remove it and replace it with
    • If it is not the first ` then remove it so it doesn’t affect the sorting or rendering phases

With that done there are no more

and no more

that do magical things. The displayReversedCitationNumbers state value has what’s needed.

Sorting Phase

  • Here I just added a reversedPosition property to the reference object to go along with the position property I was already updating after sort. Obviously it could be calculated on-demand but but 🤷‍♂️
  • If citations are sorted by citation-number then reverse the order if displayReversedCitationNumbers == true (one line change)

Rendering Phase

  • Use reversedPosition for a rendered citation-number if displayReversedCitationNumbers == true (a couple lines changed)

That’s it. I won’t be at all surprised if I run into scenarios needing adjustments but this passes all sort_BibliographyCitationNumber and sort_CitationNumber tests, and hundreds more.

— Reply to this email directly, view it on GitHub https://github.com/citation-style-language/test-suite/issues/62#issuecomment-1586404087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSVXBBBQUWDTUY7R6UTXKZRH3ANCNFSM6AAAAAAZCOUS54 . You are receiving this because you were mentioned.Message ID: @.***>