Juris-M / legal-resource-registry

Jurisdiction ID and abbreviation data files for using with Jurism and other projects.
MIT License
31 stars 37 forks source link

Refactoring of juris-XX-desc.json files #19

Closed fbennett closed 3 years ago

fbennett commented 4 years ago

@Droitslinguistiques @georgd

Following a productive meeting with @Droitslinguistiques on Monday(CA)/Tuesday(JP), I've been mulling over the requirements. This note recaps the current state of abbreviation and jurisdiction source files, and proposes some changes for discussion.

Basic Current Design

The basic shape of jurisdiction specs looks roughly like so:

  1. Jurisdiction/court mappings are described in a juris-XX-desc.json file that provides (a) a mapping of court IDs to their names and abbreviations, and (b) of jurisdiction IDs to their names, abbreviations, and associated courts.
  2. A script uses the description files as source for generating (a) auto-XX.json abbreviation files used for rendering items of each jurisdiction, and (b) machine-readable JSON used to build database entries to support menus in the UI.
  3. For 2(a) above, abbreviations for courts in each (sub-)jurisdiction are set as that of the court, which may optionally (with prefix < or suffix >) add the jurisdiction abbreviation.

Current Hacks

(This is not meant for close reading! The description is accurate, but provided here only to illustrate that we have a wooly mess on our hands that needs to be cleaned up)

That was the simple initial design, but it hit some limitations in the early going, and the syntax of the juris-XX-desc.json files was extended in various ways. Some of the changes are documented, some are not, but all were added ad hoc to support specific quirks in requirements. The list of horribles is:

  1. The court element can be omitted by prefixing the court ID at the jurisdiction level with a minus sign (-);
  2. The jurisdiction element can be omitted by prefixing the court ID at the jurisdiction level with a plus sign (+);
  3. To accommodate court short-codes for vendor-neutral citations as well as abbreviated court names keyed as abbrev, an ABBREV segment is recognized.
  4. An explicit court short-code for a specific jurisdiction can be declared by appending :: followed by the short-code to the court ID at the jurisdiction level.
  5. In an abbreviation or short-code, to suppress rendering of a (possibly abbreviated) variable later in the cite, the triggering abbreviation can be set in a special syntax (e.g. on container-title, an abbreviation might be set as "U.S.": "!authority>>>U.S.", where the "U.S." reporter covers U.S. Supreme Court judgments exclusively).
  6. In an abbreviation or short-code, to suppress some elements of an abbreviation later in the cite, the string to be clipped out can be indicated with a further extension to this syntax (e.g. court.appeal:!authority:Ohio Ct. App.>>>Ohio, where the full expression of the court+jurisdiction would be "Ohio Ct. App. 1st Dist.")
  7. The suppression syntax in 5 & 6 above can be applied either (a) in the jurisdiction-specific short-code declaration within the juris-XX-desc.json file (in which case it passes through as an institution-entire abbreviation), or (b) in the auto-XX.json file generated from it (when a different variable family, typically container-title, is the trigger).
  8. To accommodate 7(b) above in the auto-generated abbreviation files, container-title segments to the original file (before overwriting) are preserved.
  9. Language variants of abbreviations can be setting additional abbrev keys with a colon-delimited language code (e.g. abbrev:fr).

Limitations of the Status Quo

Readability: While the original design was meant to express the entire structure of a jurisdiction and its abbreviation logic in a single file, some of the abbreviation details (for container-title abbrevs) is currently expressed in the target auto-XX.json abbreviation files, which serve both as source and output target for the compilation script. This is bad design, and the "external" abbreviation details should be moved into the juris-XX-desc.json file.

UI language: While the current format of the juris-XX-desc-json files can distinguish between the court and jurisdiction names shown in the UI (keyed as name) and their abbreviations for rendering (keyed as abbrev or ABBREV), there is no provision for language arbitration in the UI. This is a problem for jurisdictions that have multiple official languages, where the user may wish the court names to be expressed in their own.

Short-code language: The current format allows only one form of ABBREV, with no provision for variants based on the preferred language domain of the style.

(I think that covers all the limitations that we've uncovered, correct me in comments if I've missed something.)

Redesign

Aims for a redesign of the file format:

Here is a tentative plan, followed by a few examples:

Examples

Alternative language for UI jurisdiction and court name

"courts": {
    "supreme.court": {
        "name": "Supreme Court",
        "ABBREV": "SC",
        "variants": {
            "fr": {
                "name": "Cour suprême"
                "ABBREV": "CS"
            }
        }
    }
},
"jurisdictions": {
    "ca": {
        "name": "Canada",
        "courts": {
            "supreme.court": true
            }
        }
    }
}

Suppress court name implicit in reporter

"courts": {
    "supreme.court": "%s Sup. Ct."
},
"jurisdictions": {
    "us": {
        "name": "U.S.",
        "courts": {
            "supreme.court": {
                "abbrev-select": "court"
            }
        },
        "container-title": {
           "L. Ed. 2d": "!authority>>>L. Ed. 2d"
        }
    }
}

Here, the name of the court will render as "Sup. Ct." unless the citation is to "L. Ed. 2d", in which case the court name (in the trailing parenthetical of a US-style cite) will be completely omitted.

Suppress jurisdiction string element if matching court short-code is used

"courts": {
    "court.appeals": {
        "name": "Court of Appeals",
        "abbrev": "%s Ct. App."
    }
},
"jurisdictions": {
    "us:oh:d1": {
        "name": "1st District",
        "abbrev": "Ohio Ct. App. 1st Dist."
        "courts": {
            "court.appeals": {
                "ABBREV": "!authority:Ohio Ct. App.>>>Ohio"
            }
        }
    }
}

Alternative language for court abbreviation

"courts": {
    "chiho.saibansho": {
        "name": "地方裁判所",
        "abbrev": "%s地方裁判所",
        "variants": {
            "en": {
                "abbrev": "%s District Court"
            }
        }
      }
},
"jurisdictions": {
    {
        "path": "jp:nagoya:gifu",
        "name": "岐阜",
        "abbrev": "岐阜",
        "courts": {
            "chiho.saibansho": {
                "variants": {
                    "en": {
                        "abbrev": "Gifu"
                    }
                }
            }
        }
    }
}
fbennett commented 4 years ago

Two things to add to the above:

sam-gagnon commented 4 years ago

This is really great!

I'm gonna try and very roughly code the Canadian abbreviations using this redesign, and I'll post it here, so we can see if it covers every possible scenario we talked about.

sam-gagnon commented 4 years ago

EDIT: I re-read what I had written and I overcomplicated everything, so here it is again, much shorter:

How would I specify in abbrev and ABBREV that the jurisdiction has to go before the court in some cases, but after the court in others?

georgd commented 4 years ago

@fbennett thank you very much for your work. I think, this covers almost everything I could think of (and even more like allowing for infixed jurisdiction abbreviation).

The only thing, I’m not sure about: there’s a variant object in the top-level courts objects:

  • In the top-level courts objects, set an optional variants object with language codes as keys to objects carrying values for one or more of the keysname, abbrev, and ABBREV.

but in the jurisdictions objects, according to your description, the variants object will only be available as a child of every single court child object:

  • Replace the courts array of keyed objects under the top-level jurisdictions key with a courts object carrying court IDs as keys, with EITHER a boolean value of true, OR the same structure as the top-level courts objects, but with all keys optional, falling back to the default values. In addition to name, abbrev, ABBREV and variants, the top-level object for a court ID may have an abbrev-select key, with a string value of jurisdiction or court to signify that the jurisdiction or court abbreviation alone should be used when generating the institution-parts abbreviation from abbrev.

As a consequence, the localised jurisdiction name will have to be repeated for every single court in a jurisdiction, even if not used for the abbreviation but only for the UI.

E.g.:

  "courts": {
    "fcc": {
      "name": "Federal Criminal Court",
      "ABBREV": "FCC",
      "variants": {
        "fr": {
          "name": "Tribunal pénal fédéral",
          "ABBREV": "TPF"
        },
        "it": {
          "name": "Tribunale penale federale",
          "ABBREV": "TPF"
        },
        "rm": {
          "name": "Tribunal penal federal",
          "ABBREV": "TPF"
        },
        "de": {
          "name": "Bundesstrafgericht",
          "ABBREV": "BStGer"
        }
      }
    },
    "fsc": "..."
  },
  "jurisdictions": {
    "ch": {
      "name": "Switzerland",
      "courts": {
        "fcc": {
          "variants": {
            "fr": {
              "name": "Suisse"
            },
            "it": {
              "name": "Svizzera"
            },
            "rm": {
              "name": "Svizra"
            },
            "de": {
              "name": "Schweiz"
            }
          }
        },
        "fsc": {
          "variants": "repeat the variants object from fcc here?"
        }
      }
    }
  }
}

I think it would make sense to allow a variants object on the single jurisdictions objects as well. In the above example, this would reduce the jurisdictions object to:

"jurisdictions": {
  "ch": {
    "name": "Switzerland",
    "variants": {
      "fr": {
        "name": "Suisse"
      },
      "it": {
        "name": "Svizzera"
      },
      "rm": {
        "name": "Svizra"
      },
      "de": {
        "name": "Schweiz"
      }
    },
    "courts": {
      "fcc": true,
      "fsc": true     
    }
  }
}
  • In the values assigned for abbrev and ABBREV, replace the < prefix and > suffix with %s in the same location, and equally recognize %s as a mid-string placeholder.

Just a thought: could it become necessary at any place to have in infixed court abbreviation between two parts of the jurisdiction? [probably rather rare, so the courts object within the jurisdictions object might offer enough solutions?]

  • To avoid accidentally spawning entire unintended language variant sets due to minor typos, the intended language variants should be declared in a top-level array of the file, say with key langs.

That’s a great idea. This could help with the proposed solution “Generate a localised file for the primary language as well.” from https://github.com/Juris-M/legal-resource-registry/issues/13: if the converter creates a file for each of the declared langs regardless of any abbrev actually declared for it, this means, if the default language is declared there, its localised file will be generated as suggested.

fbennett commented 4 years ago

Dear Georg,

Good catch, variants should be available on the outer jurisdiction object also.

For courts embedded in a jurisdiction phrase, I think literal overrides may be enough, if it's a rare case.

On Saturday, August 29, 2020, Georg Mayr-Duffner notifications@github.com wrote:

@fbennett https://github.com/fbennett thank you very much for your work. I think, this covers almost everything I could think of (and even more like allowing for infixed jurisdiction abbreviation).

The only thing, I’m not sure about: there’s a variant object in the top-level courts objects:

  • In the top-level courts objects, set an optional variants object with language codes as keys to objects carrying values for one or more of the keysname, abbrev, and ABBREV.

but in the jurisdictions objects, according to your description, the variants object will only be available as a child of every single court child object:

  • Replace the courts array of keyed objects under the top-level jurisdictions key with a courts object carrying court IDs as keys, with EITHER a boolean value of true, OR the same structure as the top-level courts objects, but with all keys optional, falling back to the default values. In addition to name, abbrev, ABBREV and variants, the top-level object for a court ID may have an abbrev-select key, with a string value of jurisdiction or court to signify that the jurisdiction or court abbreviation alone should be used when generating the institution-parts abbreviation from abbrev.

As a consequence, the localised jurisdiction name will have to be repeated for every single court in a jurisdiction, even if not used for the abbreviation but only for the UI.

E.g.:

"courts": {

"fcc": {

  "name": "Federal Criminal Court",

  "ABBREV": "FCC",

  "variants": {

    "fr": {

      "name": "Tribunal pénal fédéral",

      "ABBREV": "TPF"

    },

    "it": {

      "name": "Tribunale penale federale",

      "ABBREV": "TPF"

    },

    "rm": {

      "name": "Tribunal penal federal",

      "ABBREV": "TPF"

    },

    "de": {

      "name": "Bundesstrafgericht",

      "ABBREV": "BStGer"

    }

  }

},

"fsc": "..."

},

"jurisdictions": {

"ch": {

  "name": "Switzerland",

  "courts": {

    "fcc": {

      "variants": {

        "fr": {

          "name": "Suisse"

        },

        "it": {

          "name": "Svizzera"

        },

        "rm": {

          "name": "Svizra"

        },

        "de": {

          "name": "Schweiz"

        }

      }

    },

    "fsc": {

      "variants": "repeat the variants object from fcc here?"

    }

  }

}

}

}

I think it would make sense to allow a variants object on the single jurisdictions objects as well. In the above example, this would reduce the jurisdictions object to:

"jurisdictions": {

"ch": {

"name": "Switzerland",

"variants": {

  "fr": {

    "name": "Suisse"

  },

  "it": {

    "name": "Svizzera"

  },

  "rm": {

    "name": "Svizra"

  },

  "de": {

    "name": "Schweiz"

  }

},

"courts": {

  "fcc": true,

  "fsc": true

}

}

}

  • In the values assigned for abbrev and ABBREV, replace the < prefix and > suffix with %s in the same location, and equally recognize %s as a mid-string placeholder.

Just a thought: could it become necessary at any place to have in infixed court abbreviation between two parts of the jurisdiction? [probably rather rare, so the courts object within the jurisdictions object might offer enough solutions?]

  • To avoid accidentally spawning entire unintended language variant sets due to minor typos, the intended language variants should be declared in a top-level array of the file, say with key langs.

That’s a great idea. This could help with the proposed solution “Generate a localised file for the primary language as well.” from #13 https://github.com/Juris-M/legal-resource-registry/issues/13: if the converter creates a file for each of the declared langs regardless of any abbrev actually declared for it, this means, if the default language is declared there, its localised file will be generated as suggested.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juris-M/legal-resource-registry/issues/19#issuecomment-683300325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSWCRATZSGTOGWFRYY3SDEIBDANCNFSM4QOTNRXQ .

fbennett commented 4 years ago

Some progress. The checkin at https://github.com/Juris-M/legal-resource-registry/commit/a212a347edd082b5daac7786b9ecced19eac1788 implements conversion from the old to the new data format. The conversion (with option -c) just dumps to stdout for the moment, will hook up write-to-file when the deployment code is done.

With the more orderly structures in the desc files, deploying abbrevs will be easy. On the UI side, though (exporting to juris-maps in the client repo), things will take a little more work. The UI menus are driven by database tables populated (ultimately) from desc. To accommodate language selection in the UI, we need to extend the tables ... and when I looked at their schema, I was reminded that they are far more complicated than they need to be. I've simplified the schema, but the desc-to-SQL code in the conversion script will need to write to the new table layout. The coding shouldn't be too much trouble (b/c the data structures at both ends are now more orderly), but a side effect will be a complete reinstall of UI jurisdiction data upon installation of the next client upgrade.

Looking a bit further down the road, since the new kit will provide everything needed for support of a jurisdiction in a single desc file, we should be well positioned for on-demand extension of jurisdiction support. All that will be needed (ha) is (1) some means of controlling which jurisdiction bundle to grab on client side, (2) a protocol for acquiring the relevant desc file over the wire, and (3) moving a portion of the jurisupdate script into the client, so that things can be unpacked and installed automatically. For the present, we'll just bundle everything in the client, same as now, but this brings us one step closer to a more compact client for distribution.

More news as the situation develops ...

fbennett commented 4 years ago

Further progress. We now have the UI maps generating from the converted source with a further revision to the jurisupdate` script, and the client is loading from the new-look map files. Language arbitration has been tested and works. All that remains is to generate abbreviations from the newdesc`` files, which as I wrote above should be pretty straightforward.

Once this is all done, I'll get to the comments up-thread.

fbennett commented 4 years ago

EDIT: I re-read what I had written and I overcomplicated everything, so here it is again, much shorter:

How would I specify in abbrev and ABBREV that the jurisdiction has to go before the court in some cases, but after the court in others?

Is the difference driven by the language domain?

sam-gagnon commented 3 years ago

EDIT: I re-read what I had written and I overcomplicated everything, so here it is again, much shorter: How would I specify in abbrev and ABBREV that the jurisdiction has to go before the court in some cases, but after the court in others?

Is the difference driven by the language domain?

Sorry about the radio-silence. I've been on vacation and just got back today.

I think I actually figured it out. Only the federal jurisdiction has the jurisdiction code changing places, so I can just overwrite for that specific jurisdiction and leave all of the other ones alone.

I'll let you know how it turns out.

fbennett commented 3 years ago

The README for the LRR has been updated to reflect the new file format.