Juris-M / legal-resource-registry

Jurisdiction ID and abbreviation data files for using with Jurism and other projects.
MIT License
31 stars 37 forks source link

Languages in multilingual jurisdictions #13

Open georgd opened 4 years ago

georgd commented 4 years ago

The current setup of jurism-abbreviations allows quite flexibly handling multiple languages. However, it’s possibly a bit too flexible.

The settings

Different situations are met in the wild:

Citation requirements

The issues

A possible solution

georgd commented 4 years ago

So, the solution for generating a localised file for the primary language is solvable by the language declarations as proposed in https://github.com/Juris-M/legal-resource-registry/issues/19.

In addition to the cases listed above, it might be useful to declare an unofficial variant to be used in the UI. This is especially useful in the case of official names in non-latin scripts. Especially scholars in international law might need to cite e.g. Chinese decisions which can be found in translation in certain databases, without knowing any Chinese. Identifying the correct court is currently impossible in the UI.

Thus, the different use cases which should be descernible are:

fbennett commented 4 years ago

It will be nice to have transliterations or translations of court/jurisdiction strings for many countries in the UI, but while we can capture the necessary data in the LRR files, deployment in the client wil have to wait for some changes. The jurisdiction data is getting pretty bulky already, and most of it is unneeded by most users. Translations/transliterations for China would more than double the size of its data, and it's already a big slice of jurisdiction data overall.

I haven't come up with a concrete plan for it yet, but at some point jurisdiction data should be stripped from the client core, and made available over the wire as requested or as required. When that eventually happens, we'll have capacity to add unofficial traslations/transliterations.

(In many cases, of course, an author unable to read the original would likely be citing a translation in any case, or relying entirely on secondary works that interpret the original.)

On Monday, August 31, 2020, Georg Mayr-Duffner notifications@github.com wrote:

So, the solution for generating a localised file for the primary language is solvable by the language declarations as proposed in #19 https://github.com/Juris-M/legal-resource-registry/issues/19.

In addition to the cases listed above, it might be useful to declare an unofficial variant to be used in the UI. This is especially useful in the case of official names in non-latin scripts. Especially scholars in international law might need to cite e.g. Chinese decisions which can be found in translation in certain databases, without knowing any Chinese. Identifying the correct court is currently impossible in the UI.

Thus, the different use cases which should be descernible are:

  • name variants in official languages – usually to be used in styles and the UI
  • unofficial translations/translitterations to be used in certain styles on explicit request
  • unofficial translitterations/translations to be used in the UI

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Juris-M/legal-resource-registry/issues/13#issuecomment-683476211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSSBUN6GAZXHXP353W3SDLED5ANCNFSM4QKRGD5A .

fbennett commented 4 years ago

@georgd Thanks for this very thorough summary of requirements and infelicities. I haven't fully gotten my head around the issues, but here are a couple thoughts for the present, as we wait for the work at https://github.com/Juris-M/legal-resource-registry/issues/19 to settle down.

georgd commented 4 years ago

@fbennett Thank you!

(In many cases, of course, an author unable to read the original would likely be citing a translation in any case, or relying entirely on secondary works that interpret the original.)

I’m specifically referring to the international tax law community and the IBFD, which is maintaining a database that, among others, collects case law from all kinds of nations and provides English translations of them in their database. They even provide a list of courts with abbreviations and English court name translations, so I’m planning to add them to the relevant desc-files with an appropriate label (like en-ibfd) if that’s ok.

fbennett commented 3 years ago

We'll need to deal with a scalability issue for this. As things are currently set up in the new desc format and its "compiler" script, adding an English equivalent for a subset of Chinese courts would generate a full alternative court/jurisdiction mapping and abbreviation list for the entire jurisdiction. Data for China is currently ~380k in the client download, and adding English equivalents would roughly double that. More serious than the (minor) additional download burden, the extension would impact the time required for first-install, as the client grinds away at building SQL database tables for the mappings. That's currently taking 3-5 minutes, and pushing that further may test user patience. If multiple jurisdictions are mapped to English, of course, the bloat will be larger.

To cover requirements like this, I would like to find some way to separate jurisdiction "bundles" from the main client distribution. Plugins would be one option, but that raises issues of programming burden, maintenance, data updates, and mutual compatibility that would attach to each plugin released. The cleaner solution would be to provide for over-the-wire extensions and updates in Jurism directly. That will have a one-time programming burden for designing a sync-down protocol, implementing it in client and server, and tying server-side updates to the GitHub repository. UI support of preference menus in the client is currently coded in Firefox XUL, but anything added to preferences right now will need to be reimplemented in React when Zotero makes the jump (the timing of which we don't know).

Although adding the language variants in the client distribution would be difficult at this stage, there is the option of adding them in the desc files, but omitting their language/variant key from the langs segment of the file. That way, they could be maintained together with the native-language tree, and map` and abbreviation files could be generated for them *relatively* simply by ... cloning the LRR installingnpm, configuring thejurisupdatescript to write into the user'sjuris-mapsandjuris-abbrevsdirectories, editing thelangs`` array of the source file, and running the script.

Not ideal, but ... what do you think?

georgd commented 3 years ago

Thanks for your great work!

We'll need to deal with a scalability issue for this. As things are currently set up in the new desc format and its "compiler" script, adding an English equivalent for a subset of Chinese courts would generate a full alternative court/jurisdiction mapping and abbreviation list for the entire jurisdiction. Data for China is currently ~380k in the client download, and adding English equivalents would roughly double that. More serious than the (minor) additional download burden, the extension would impact the time required for first-install, as the client grinds away at building SQL database tables for the mappings. That's currently taking 3-5 minutes, and pushing that further may test user patience. If multiple jurisdictions are mapped to English, of course, the bloat will be larger.

TBH, I don't really mind the longer installation time, as it's a one time thing. But of course the user experience wouldn't be great.

To cover requirements like this, I would like to find some way to separate jurisdiction "bundles" from the main client distribution. Plugins would be one option, but that raises issues of programming burden, maintenance, data updates, and mutual compatibility that would attach to each plugin released.

I think, plugins might be ok but I don't like that idea much. In addition to your sorrows, it's not very user friendly. I prefer a solution that doesn't put this in the user's responsibility.

The cleaner solution would be to provide for over-the-wire extensions and updates in Jurism directly. That will have a one-time programming burden for designing a sync-down protocol, implementing it in client and server, and tying server-side updates to the GitHub repository.

That's preferable in my eye — a mechanism for automatically updating jurisdiction data similar to updating style files without the necessity to reinstall the whole application.

UI support of preference menus in the client is currently coded in Firefox XUL, but anything added to preferences right now will need to be reimplemented in React when Zotero makes the jump (the timing of which we don't know).

Having to write the same thing twice within a not so long time should be avoided, if possible. So, if there's an easier workaround for the time being, it could be worth looking at it.

Although adding the language variants in the client distribution would be difficult at this stage, there is the option of adding them in the desc files, but omitting their language/variant key from the langs segment of the file. That way, they could be maintained together with the native-language tree, and map and abbreviation files could be generated for them relatively simply by ... cloning the LRR installing npm, configuring the jurisupdate script to write into the user's juris-maps and juris-abbrevs directories, editing the langs array of the source file, and running the script.

~I don't get that, I think.~ Are you saying, to add the jurisupdate code to Jurism and have it generate the required files on request? Would that work for variants that are required by a certain citation style as well upon request from a citeproc?

fbennett commented 3 years ago

(Sorry for the delayed response here...) What I had in mind there was that, without an changes to the client, a patient and technically skilled user could update the Jurism runtime files from a local copy of juris-desc/LRR. (Meanwhile, it seems that over-the-wire style updates are not yet working, so I'll need to polish my skills for protocol debugging before raising the sights to jurisdiction bundle downloads...)