Closed georgd closed 3 years ago
Sorry, my little brains aren’t fully functional today, as it seems. Sure, I had reported this already and you even commented that it was the language arbitration bug. Should we close this too?
That's possible. I've been struggling to refamiliarize myself with the data flow around abbreviations (the structures themselves are a challenge, and the code that handles them is very gradually improving, but it's not the most readable). We can leave this open as a marker, and see if it clears once locale arbitration comes onstream.
I think I have this working in code. Will put up a beta for inspection after some code cleanup.
Thanks. The new version doesn’t pick up any style-module: When citing an item that should be rendered via style-module, I get [CSL STYLE ERROR: reference with no printed form.]
. This happens with the leg-cit styles as well as with the indigobook styles.
Glad it was a beta! For abbrev locale arbitration, it's appending a suffix to jurisdiction in that context. That must be leaking into the jurisdiction value used to fetch modules. There are tests for default modules, but none for modules with domain extensions, so it passes tests, but fails on our production styles. Should be an easy fix. More soon.
On first reading, I don’t fully understand your analysis — will reread :). But for the time being, the original variant is still not reachable. jm-ibfd, which lists englished enIBFD
in jurisdiction-preference
is giving me German abbreviations for eu.int cases.
That was thinking out loud, you can ignore it. I was unable to reproduce the error in an initial trial under Linux, but ran into a larger error---the Mac isn't upgrading the DB, which causes the entire abbrevs infrastructure to fail. Investigating now ...
On Thursday, October 15, 2020, Georg Mayr-Duffner notifications@github.com wrote:
On first reading, I don’t fully understand your analysis — will reread :). But for the time being, the original variant is still not reachable. jm-ibfd, which lists englished enIBFD in jurisdiction-preference is giving me German abbreviations for eu.int cases.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Juris-M/zotero/issues/93#issuecomment-709173870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSRLDUAQBRHSCMQ4ZVLSK3IC7ANCNFSM4SORY5GQ .
This particular bug-hunt grew into something of a nightmare. It was due to an obvious syntax error in an SQL statement that hadn't triggered during development under Linux for some reason, but I've needed to release a long series of betas, as that was the only way to get revised code onto the Mac (direct builds on the Mac do not work, for reasons I'm not keen to look into).
(I'll be rebasing the repository to clean up the mess this has made in the project history, so if you've pulled anything in the past few hours, you may need to rewind your clone a bit to line it up with GitHub.)
With the abbreviations upgrade issue resolved on the Mac beta, IndigoBook styles seem to be working fine. Possibly this has resolved the style module issue as well?
Now, I’m seeing something curious: I updated my Jurism installation and observed:
Then, I removed Jurism and all its files and re-installed it. Now, I‘m back to [CSL STYLE ERROR: reference with no printed form.]
with styles that use style modules (indigobook and leg-cit alike) and ibfd again applies German abbreviations which it doesn’t call for.
One more thing: there’s a discrepancy between citations and bibliography (with ibfd, as I don’t see anything with indigobook and leg-cit):
AT court (existing variants: enIBFD
; desired variant: enIBFD
):
citation: enIBFD
abbreviation applied
bibliography: no abbreviation applied
EU.int court (existing variants: de
; desired variant: default):
citation: de
abbreviation applied
bibliography: de
abbreviation applied
~FR court (existing variants: enIBFD
; desired variant: enIBFD
):
citation: default abbreviation applied
bibliography: no abbreviation applied~
(bad example: no IBFD abbreviation for the cited court)
Hmmm. After restarting Word and Jurism for the third time, leg-cit and indigobook brought back the printed forms of legal case citations. Still, the application of abbreviations is not following a pattern that I could recognise:
coe.int:
de
abbreviations applied in JM New Zealand Law Style, JM OSCOLA, JM Diritto Pubblico in citation and bibliographyde
abbreviations applied in JM IBFD in citation only. No abbreviation applied in bibliography.eu.int:
de
abbreviations applied in JM IBFD in citation only, no abbreviation applied in bibliography.nl:
enIBFD
abbreviations applied in JM IBFD in citation and bibliographyat:
enIBFD
abbreviations applied in JM IBFD in citation and bibliographyMany thanks for your patience with all the testing. It should be better now. With the latest update, the beta will again update the 22 jurisdictions for which there are domain extensions for abbrev variants. All of your examples above now draw correct abbreviations in my testing here.
Fitting that this (near?) last bug was in a regular expression. https://github.com/Juris-M/abbrevs-filter/commit/5e747880d5c409dbdf9aa0df191d74b2df0c5279
I think we’re getting closer but we’re not there yet.
This is what happens in an empty document, using a JM Leg Cit style.
de
) applied correctly.In another document: NL court is correctly abbreviated in the bibliography but the code is printed in the citation (might be a different issue, as there’s no style module for NL).
Thanks for the steps to reproduce. I'll give that a try here, and report back.
On Fri, Oct 16, 2020 at 4:35 PM Georg Mayr-Duffner notifications@github.com wrote:
I think we’re getting closer but we’re not there yet.
This is what happens in an empty document, using a JM Leg Cit style.
- Add citation to ECJ case + bibliography: abbreviations (variant de) applied correctly.
- Add citation to ECHR case (coe.int): abbreviations applied correctly.
- Add citation to Austrian supreme court case: AT-abbreviations are applied correctly but the eu.int and coe.int abbreviations disappeared and are replaced by the court codes.
In another document: NL court is correctly abbreviated in the bibliography but the code is printed in the citation (might be a different issue, as there’s no style module for NL).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Juris-M/zotero/issues/93#issuecomment-709880070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSQHTHMUYVK7MTW7JE3SK7ZUJANCNFSM4SORY5GQ .
Found the bug, and see how to fix it, but it's late here. Will get out a fix first thing in the morning.
Thank you very much! I’m really sorry for being such a nuisance. If I can improve how to report issues, please tell me (now, when writing it occurs to me: would a debug ID have been helpful?).
Good morning. It's fine! The code is green, but the logic of it all is fresh in mind, and your steps to reproduce above were very clear.
The source of the bug was quick to pin down. Abbreviations are placed in a database on install, to save memory and allow for user edits. Database access is asynchronous, but the processor runs synchronously, and isn't able to access the database directly, so abbreviations potentially needed for each item are loaded into memory before the processor is run. That's done by loading the abbreviation sets available for the top-level jurisdiction of the item under an appropriate key (e.g. for coe.int
, it would be coe.int
& coe.int@de
, and for at
, it would be at
& at@enIBFD
), and storing a list of the variant domains in a variable (availableAbbrevDomains
). Then that list is checked against the domains preferred for the language of the item (so for JM leg cit mit Literaturverzeichnis in the de
locale that would be de
, deAT
& LegCit
). The processor chooses a domain from the intersection of the two lists, and applies that list to item content. So if the jurisdiction is coe.int
, the coe.int@de
list is selected.
The problem is that availableAbbrevDomains
is being overwritten for each item pre-scanned, and the processor is given the list of the last item scanned (oops). That leads to a potential mismatch between the keys requested by the processor and the keys available in memory. In this case, the processor was (for example) calling for at@de
, which ... doesn't exist.
The fix should be straightforward: availableAbbrevDomains
needs to persistently store the lists for each top-level jurisdiction encountered, so the processor can fetch the correct list for evaluation. That's the theory, now we'll see how it works out in practice ...
More soon!
A fresh beta is up for the Mac. Should be closer!
It think, now this is as close as it gets. Only, the NL court still shows no abbreviation but the code in the citation. How das fallback work when no style module exists for a jurisdiction?
Just checked. This one is not a bug in the recent processor code, but it exposes a trap for the unwary that we may be able to address with a small change in the processor. I'll fill in some background first, then note a possible way to protect against this kind of anomaly. (The background may be [painfully] familiar, but I'm posting it here for the benefit of others who drive by this thread in the future.)
The extensions in CSL-M were initially aimed at US legal styles, which have some specific quirks, one of which dates back to the 1980's. From the beginning of the 20th century, official publication of court judgments across the US was routed through West Publishing. Citations in court filings referred to the West reporters by volume, reporter name, page, and year. This worked nicely until computer networks arrived and gave rise to a couple of issues with the old system. First, there was a significant time lag between release of the slip opinion by the courts and arrival of the official report from West, an inconvenience that attracted increasing attention as information systems generally grew faster over time. Second, and more importantly, it become clear (through a lawsuit on the subject) that reliance on West page numbering in official citations gave West a great deal of market power.
In response, a number of states introduced "vendor-neutral" or "public-domain" citation systems set directly by their courts. This effort was spotty, and added to the challenges of building automated referencing systems for US law. The treatment is uneven---only about a dozen states moved to vendor-neutral systems. It is inconsistent---each of the vendor-neutral formats differs from the others. It is partial---only one state (Oklahoma) back-fit vendor-neutral cite IDs to older cases. It also impacts parallel-citation logic.
To cope with the coexistence of West-official and vendor-neutral citation formats, Jurism needs two separate abbreviations for court names: a normal abbreviation for use in West citations (or cites to slip opinions); and a court code for use in vendor-neutral cites. The way that's done in CSL-M is to register two categories of abbreviation for institution names: institution-part
to cover the former case, and institution-entire
covering the latter. This is not yet documented as well as it should be, but these are the forms:
institution-part
abbreviation:
<names variable="authority">
<name/>
<institution instiution-parts="short"/>
</names>
institution-entire
abbreviation:
<names variable="authority">
<name/>
<institution form="short"/>
</names>
In citation context, the jm-leg-cit-rechtsquellenverzeichnis-literaturverzeichnis
style calls juris-main-short
to render the legal_case
type. In the juris-nl.csl
module (which is bundled and should exist), juris-main-short
calls authority
with form="short"
(the institution-entire
short-form). The selected abbrevs file for the rendering will be auto-nl.json
, which has court-code definitions for institution-part
, but not institution-entire
. No abbreviation is found under that category, so the system falls back to rendering the raw code.
This is easy to fix, by either providing an ABBREV for each court (which compiles to institution-entire
), or (probably better) by calling the authority
variable in the juris-nl.csl
module; but the need to coordinate code across multiple files with a non-obvious relationship is a formula for bugs and confusion.
The way the desc
compiler is set up, all courts will have an institution-part
abbreviation for their code. The glitch in this case could be addressed by adjusting the processor to fall back to abbreviating with institution-part
if and attempt at institution-entire
fails.
And here at the end of that long-long story ... what do you think?
... it also looks like the juris-nl.csl
module itself needs some formatting attention. :-/
I've been toying with another idea for some time that might reduce the burden of maintaining our growing family of style modules. Like locale evaluation, which falls back to en-US, the search for style modules currently ends with selection of the US as a fallback. Since the "legal families" tend to cluster in their citation formats, I've been thinking that modules should be able to designate an intermediate fallback that's closer to their requirements. Your thoughts on that one?
This is easy to fix, by either providing an ABBREV for each court (which compiles to
institution-entire
), or (probably better) by calling theauthority
variable in thejuris-nl.csl
module; but the need to coordinate code across multiple files with a non-obvious relationship is a formula for bugs and confusion.The way the
desc
compiler is set up, all courts will have aninstitution-part
abbreviation for their code. The glitch in this case could be addressed by adjusting the processor to fall back to abbreviating withinstitution-part
if and attempt atinstitution-entire
fails.
Seeing all the history, I‘m not sure if some other changes should be attacked as well. Semantically, shouldn’t the European court abbreviations go into ABBREV
? The German courts are already organized like that, which makes sense as the federal courts are often cited by references to the official reporter. But as far as my researches reached, so far, that’s an exception. Thus, the direction of the fallback would go the other way round.
I've been toying with another idea for some time that might reduce the burden of maintaining our growing family of style modules. Like locale evaluation, which falls back to en-US, the search for style modules currently ends with selection of the US as a fallback. Since the "legal families" tend to cluster in their citation formats, I've been thinking that modules should be able to designate an intermediate fallback that's closer to their requirements. Your thoughts on that one?
Once during the last months I wondered how the fallbacks worked and if some jurisdictions shouldn’t fall back to a US like format. So, yes, I think that’s a good idea. Do we have enough informations to more or less reliably cluster them?
If courts in a jurisdiction are always cited with a short-code, never by a descriptive name, the code can safely be set as "abbrev." The choice can be revisited if descriptive names later become necessary: because the abbreviations are set automatically, and are called via modules that update together with the abbreviation lists, a change from using institution-part
to using institution-entire
would be transparent to the user.
The fallback from "ABBREV" (institution-entire
) to "abbrev" (institution-part
) is the right direction for it, because only the latter is guaranteed to exist. The compiler script makes these assignments in sequence when processing the desc
file:
institution-part
institution-part
institution-entire
So all entries must have a "name" value for the UI menu label, and that is used for the institution-part
abbreviation if no "abbrevs" value is given. If no "ABBREV" value is given, though, there will be no institution-entire
value. The fallback in the processor to institution-part
assures that the court code will never appear in citations.
Good to hear that you've had the same thought about style-module fallbacks. I'll open a separate issue for it.
After the 'semi-clean' install of the new release, Juris-M is always using the enIBFD variant when I’m citing Austrian cases with jm-leg-cit styles — although enIBFD is not listed among the variants in
jurisdiction-preference
.