andras-simonyi / citeproc-el

A CSL 1.0.2 Citation Processor for Emacs.
GNU General Public License v3.0
85 stars 9 forks source link

Implement container tracking feature from CSL-M #96

Open Quintus opened 2 years ago

Quintus commented 2 years ago

Dear Andras,

as a followup to https://github.com/andras-simonyi/citeproc-el/issues/92#issuecomment-1009968627 I would like to ask if it is possible to implement from CSL-M the container tracking feature, that is, track-containers, consolidate-containers, first-container-reference-note-number, position="container-subsequent", container-multiple, container-subsequent. As discussed in the linked issue, this comes in quite handy for certain kinds of work.

This feature is, as far as I can tell from the CSL-M spec, not incompatible with ordinary CSL, but just an extension, so implementing it should not conflict with the code for normal CSL processing.

-quintus

denismaier commented 2 years ago

@bwiernik, @bdarcus what do you think about those as enhancements for vanilla csl? @jgm @cormacrelf What do other implementors think about this?

bwiernik commented 2 years ago

This is used for collapsing of parallel citations in legal items, right?

denismaier commented 2 years ago

(parallel citations are supported via these attributes.)

For reference: use case is here.

From the csl-m specification:

The consolidate-containers attribute can be set on cs:bibliography, and takes a list of item types as argument. For the designated item types, it implicitly invokes track-containers, and renders only a single item in the bibliography for each container in the input.

With this, you can cite individual chapters from a legal commentary in the notes, but have only the legal commentary as a whole in the bibliography, i.e., not the individual chapters.

The other attributes implement certain test, e.g. whether a container has been seen already.

bwiernik commented 2 years ago

Ah yes, we'd hoped to add something like this. Annual Reviews and many German styles need it. Didn't realize that CSL-M had implemented

bwiernik commented 2 years ago

I don't have a link to previous discussion at hand, but the question was whether we wanted to match containers based on matching field content or require some sort of related item structure. I am in favor of matching field content

denismaier commented 2 years ago

A related item structure would clearly more powerful and precise, but also more difficult to implement.

cormacrelf commented 2 years ago

Yes, unfortunately related items would bubble up into GUI work and changes to library management. If you were going to do related item structure, you would do it like "container-id": "citekeyOfContainer2009", and have a separate container item with that id. That would be mostly pretty simple for citeproc authors, even if you decide to delegate some field contents to the container item, we can just build merged ones and use those to render chapter items. Presently libraries are completely flat, so this would be a step in a new direction that I'm sure someone has proposed at some point. It does solve the problem of having to duplicate a container entry, change type to chapter, and then manage propagating edits yourself. Ultimately if e.g. Zotero put some time into this UI, citing a chapter of a book could become really easy and foolproof. But whether it is worth that time, I don't know.

For the field matching, Frank's implementation simply makes a "container id" out of the fields type, container-title, publisher, edition. These four have to match exactly, so this forces you to do the manual work of spotting mismatches between chapters, because any typos come out as duplicates in the bibliography. In terms of future effort, implementing this also helps implement parallel citations, as field matching is central to that CSL-M extension as well. Note that parallel citations allow suppression of specific cs:groups based on arbitrary matching fields. I'm not sure if the 4 fields Frank chose are enough for every situation -- @Quintus could tell us if they would cover his legal citation needs.

There is, however, nothing stopping you from doing both ideas, starting with the easy one (field matching). Later, you could have the presence of a container-id field disable the auto consolidation for that item and rely on the provided information instead.

Quintus commented 2 years ago

@Quintus could tell us if they would cover his legal citation needs

German legal commentaries often simply bear the commented law’s official title with "commentary" appended as their title, so that on itself would not be enough to properly distinguish works, and type will be the same anyway. With publisher and edition however I expect it to be pretty unlikely that the resulting identifier would be ambiguous. Commentaries are typically referred to by the editors as the most distinguishable element, but that’s irrelevant for an internal automatically generated identifier.

One thing that just recently came to my mind are looseleaf commentaries. In fact, commentary type 4 from #92 is probably the digital variant of a traditional looseleaf commentary. These do not have an edition valid for the commentary as a whole, but since pages are updated independently, there’s an edition marker on each page instead. Still, even taking the lack of edition on the containing work, I cannot come up with an ambiguous case in practise. There’s just no publisher with multiple looseleaf commentaries on the same law that I would be aware of.

bwiernik commented 2 years ago

Could you take a look at how commentaries are handled in the CSL-M specification (https://citeproc-js.readthedocs.io/en/latest/csl-m/index.html#legal-commentary-extension) and in the Jurism/Indigo Book collaboration (https://juris-m.github.io/indigobook/) and see if what you are describing is consistent with what's happening there?

Quintus commented 2 years ago

Could you take a look at how commentaries are handled in the CSL-M specification

Sure. CSL-M holds legal_commentary to be a type similar to chapter. This is sufficient if container tracking is implemented the way described in CSL-M with the attributes mentioned in the OP, because a commentary is only listed once in the bibliography, even if multiple chapters by different authors are cited. As far as I understand how CSL-M’s container tracking works, it specifically covers this use case, which is best described as “collapsing” multiple otherwise separate bibliography entries for the separate chapters/articles into one single entry for the book in the bibliography. Specifically, consolidate-containers does exactly this.

CSL-M does not specify how the “container” is inferred from the bibliography entries, unless I have overlooked something. This looks like a flaw to me and explains @cormacrelf’s question to me. Biblatex deals with this rather pragmatically: the user is required to specify the container as a separate entry with it’s own ID and reference it’s ID in a special field named crossref. This way, it is up to the user to designate the relationships. For an example, see #91, which uses this construction for a handbook rather than a commentary, but those are to be collapsed in a bibliography into one entry as well. Btw. I’m not aware of a Biblatex style that actually does this collapsing.

If I read the CSL-M specification correctly, the virtual “container” entry does not get passed in itself as a separate entry through the CSL processor. Instead, the chapters themselves are passed. It is not clear to me if a single random example is passed from all the collapsed entries, simply the first one, or all of them. In the latter case it would be required to suppress any output for all but a single entry, because otherwise the collapsing effect would not be achieved. It may be that this is what the container-multiple condition is for, but I admit that I have not fully understood how this condition is supposed to work. With this specification, I am not sure how I would have to implement a driver for legal_commentary that produces just this one collapsed entry in the bibliography. When I write something like:

<choose>
    <if type="legal_commentary">
        <choose>
            <if container-multiple="true">
                <text value="I am a collapsed entry"/>
            </if>
        </choose>
    </if>
</choose>

would this print I am a collapsed entry once for each individual entry? That would defeat the purpose, so it needs to be executed just once; container-multiple probably evalues to true only once to guarantee this and then to false for the other individual entries. Which then raises the question which of the individual entries would be passed when it evalues to true. If instead one would pass the virtual “container” entry, this poses the question which attributes actually belong to the container and which to the individual entries. I’d be wary to say that this can be specified the same for all possible uses world-wide, but if I needed to specify, I’d take a negative approach and say that all fields should be included except for those which 1) obviously refer to the individual entry, like author, page, and chapter-number, and those which 2) differ among the individual entries.

If instead Biblatex’s approach with a separate container entry in the bibliography database is followed, it would make sense to simply and only pass that entry through the CSL processor and suppress the to-be-collapsed individual entries entirely.

Jurism/Indigo Book collaboration (https://juris-m.github.io/indigobook/)

I am afraid that the Indigo Book does not contain any reference to legal commentaries. This is not surprising, because this kind of literature is specifically European, if not German, and it is not used in the US, which is the Indigo Book’s primary target if I read the foreword correctly. In case law tradition (i.e. everywhere the British Empire left its footprint, which is a considerable part of the world and probably all countries speaking English) this kind of literature does not make much sense anyway, since it is specifically about guiding the professional’s work with the text of the law as adopted by parliament.

Specifically, the Indigo Book’s rule 28.5 deals with editors and the fourth example includes one. This is fine in itself, but rule 28 does not concern itself with what happens if different works by different authors from the same book are cited. Rule 28 in this case will yield two entirely independent citations, which conflicts with how it would have to be cited in a German publication. Even in a footnote-only citation style it is recommended to give the full details only once in these cases, though there does not appear to be a rigid enforcement of this.

Quintus commented 1 year ago

With a good year gone by, is there any progress on this matter? Do you need more information from me?

andras-simonyi commented 1 year ago

With a good year gone by, is there any progress on this matter? Do you need more information from me?

Unfortunately, there has been no progress, but now I had thought a bit about the problem and I wanted to ask your opinion about making use of a "crossref" field, which citeproc-el already can handle in biblatex input thanks to the parsebib library, and could simply transfer to an analogous CSL field. It would make the implementation way easier, I think.

Quintus commented 1 year ago

Am Mittwoch, dem 25. Januar 2023 schrieb András Simonyi:

Unfortunately, there has been no progress, but now I had thought a bit about the problem and I wanted to ask your opinion of making use of a "crossref" field, which citeproc-el already can handle in biblatex input thanks to the parsebib library, and could simply transfer it as a CSL field. It would make the implementation way easier, I think.

Using an explicit “crossref” field appears fine to me. I am already doing that internally in my .bib databases anyway to keep things clean.

-quintus

-- Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite Passau, Deutschland | @.*** | O<

Quintus commented 6 months ago

I have elevated this to CSL itself as I think it is a worthwhile addition even outside of the law discipline: https://github.com/citation-style-language/schema/issues/436

andras-simonyi commented 1 week ago

I'm looking into implementing this now, at least partially. An issue which I'm not clear about is the role of container-multiple and container-subsequent for item types for which consolidate-containers is active. How is the collapsed container's bibliography entry rendered? Is this what the condition container-multiple is used for? Also, does container-subsequent make sense for consolidated item types, when there will be only one bib item? All in all I guess I have two questions about your use-case: (i) do you need container-subsequent (ii) should the collapsed items be rendered using (one of the) the contained item data by setting the container-multiple condition to true or by pulling the container item's data from the bibliography (as the id would be available in our case from the crossref field) and rendering it according to the style?

andras-simonyi commented 1 week ago

A disturbing thing about container-subsequent is that, as far as I can see, it's (potentially) circular: the ordering of bibliography items can depend on the value of any condition including container-subsequent, whose value depends on the bibliography order...