citation-style-language / schema

Citation Style Language schema
https://citationstyles.org/
MIT License
184 stars 61 forks source link

Multilingual data and style structures #332

Open bwiernik opened 4 years ago

bwiernik commented 4 years ago

We started to discuss multilingual data structures here: https://github.com/citation-style-language/schema/issues/327

In terms of data structures, my inclination is that storing multilingual variants should occur at the field-level. So, any field might be object with value, language, and translated elements. The translated element would be an array with elements holding value and language elements. Subordinate elements without a language would inherit language from their parent. That would have 3 benefits:

  1. It would permit simple indication that a field is a different language than the item (e.g., an English article published in a German journal).
  2. It would jive with https://juris-m.readthedocs.io/en/latest/dev-sync-simplification.html
  3. It would provide a consistent structure for providing translations of one or more fields for an item.

Originally posted by @bwiernik in https://github.com/citation-style-language/schema/issues/327#issuecomment-666385146

denismaier commented 4 years ago

That looks like a sound approach. Ideally, we would still allow a flat string approach as a simpler alternative if no multilingual data input is needed. So you could do:

publisher: Oxford University Press

Or:

publisher: 
  value: whatever
  language: ar
  language-alternate:
    value: Transliteration of the publisher's Arabic name
    language: ar-alc97
denismaier commented 4 years ago

Another question will then be how these language alternates will be accessible in styles. A simple approach could be something like testing for a language attribute, like <if variable="title" language-alternate="de">. (See https://github.com/citation-style-language/schema/issues/327#issuecomment-667008825)

The drawback of this is that it will make style coding more complicated than necessary. In the medium to long run (i.e. after 1.1) we should therefore consider adding (optional, modularized) features to simplify this. I could imagine three potential solutions:

  1. New attributes on cs:style, cs:bibliography and cs:citation

  2. A new element cs:multilingual next to cs:citation and cs:bibliography.

    <multilingual>
      <titles>
        <main/>
        <alternate="en" prefix="[ " suffix="]"/>
      </titles>
    </multlingual> 
  3. This new cs:multilingual element could even work a bit like locales. You'd have special multilingual configuration files that could be used together with regular styles.

bwiernik commented 4 years ago

Ideally, we would still allow a flat string approach as a simpler alternative if no multilingual data input is needed.

Yes, I think we would make the version on the CSL JSON explicitly denote that it is multilingual. That way, we can allow for normally-flat-string fields to be either flat-strings or objects. That gets around the contortions @fbennett needed to do to make CLSm JSON type-compatible with CSL JSON.

If CSL-ML JSON needs to be converted to vanilla CSL JSON, its the simple transformation that string-type variables have their value extracted and the multilingual elements dropped.

bwiernik commented 4 years ago

There are a few "levels of involvement" here:

"Simple" handling

  1. Rendering of individual translated fields.
  2. Rendering of transliterations instead of original-script fields style-wide. (These are what APA wants, for example). These we could consider adopting into vanilla CSL.

"Moderate" handling

  1. Consistent rendering of both/all of original script, transliterated, and translated fields style-wide. This could be handled using something like cs:multilingual above (which I think looks similar to the CSLm cs:alternative). I would suggest here that it always goes to the locale/writing system of the bibliography environment its rendered in. Multiple locales is a further step (below).

"Complex" handling

  1. Separate bibliography layouts by locale. Ala CSLm.
denismaier commented 4 years ago

Just a quick note: biblatex is adding multiscript support: https://raw.githubusercontent.com/plk/biblatex/multiscript/doc/latex/biblatex/biblatex.tex

More information under \subsection{Multiscript Support}

(Don't know how exactly that will work. Need to digest that first...)

denismaier commented 1 year ago

I just ran into this: https://www.ctan.org/pkg/biblatex-ms

Apparently, the biblatex folks have recently published the multiscript variant of biblatex. As of now, both versions seem to exist in parallel, the multiscript version is said to be slower and still a bit experimental, but it should eventually replace the current version.

Should be worthwhile looking into this...