citation-style-language / schema

Citation Style Language schema
https://citationstyles.org/
MIT License
184 stars 61 forks source link

proposal: allow locale-specific layouts #63

Open fbennett opened 13 years ago

fbennett commented 13 years ago

In non-English publishing, foreign references are often provided in the native script, and follow the formatting conventions of the language realm from which they originate. To correctly handle such styles, it is sufficient to switch to the target locale when rendering references from the target language. The "language" variable on individual items can be used for this purpose, together with locale-specific layouts set with the following syntax:

<citation>
  <layout locale="en es">
    ...
  </layout>
  <layout>
    ...
  </layout>
</citation>

With a default locale of "ru", this would format items with no value or "ru" in the "language" field using the code for the default layout in the "ru" locale, while items with "en" in the "language" field would be formatted using code for the "en es" layout in the corresponding "en" locale.

fbennett commented 13 years ago

A patch covering this proposal is available for preview here [link updated].

avram commented 13 years ago

while items with "en" in the "language" field would be formatted using code for the "en" layout in the corresponding "en" locale

Should that be 'with "en" or "es"'? And at this point do we point to BCP 47 and say that BCP 47 language tags are what we're talking about?

fbennett commented 13 years ago

The full spec description will need to refer to BCP 47, for sure. In the example, that should read `formatted using the code for the "en es" layout in the corresponding "en" locale', to be more precise. But it's right that the "en" locale would be applied.

fbennett commented 13 years ago

For this item, I'll treat the first entry as a document, and make revisions to the proposal in response to comments and discussion.

rmzelle commented 13 years ago

See also http://xbiblio-devel.2463403.n2.nabble.com/Proposal-test-condition-for-quot-language-quot-td5775633.html

As I mentioned to Frank off-list, this solution has a few constraints, in that attributes that are specific to cs:style, cs:citation and cs:bibliography cannot have separate values for the separate locales (initialize-with-hyphen, demote-non-dropping-particle, page-range-format, the disambiguation and collapsing options (on cs:citation), near-note-distance, subsequent-author-substitute and some white-space options for cs:bibliography). That might not be a deal-breaker, though.

fbennett commented 13 years ago

As I mentioned to Frank off-list, this solution has a few constraints, in that attributes that are specific to cs:style, cs:citation and cs:bibliography cannot have separate values for the separate locales (initialize-with-hyphen, demote-non-dropping-particle, page-range-format, the disambiguation and collapsing options (on cs:citation), near-note-distance, subsequent-author-substitute and some white-space options for cs:bibliography)

Those limitations shouldn't be a problem. If there is demand for any of those options, they can be allowed on the non-default cs:layout elements together with the locale.

avram commented 13 years ago

So we're saying that if locale is specified at all in cs:layout, then language-specific term usage will be automatically activated? And otherwise a single locale's terms will be applied, which is the current behavior.

And just to be 100% clear, you mean: [..] while items with "en" in the "language" field would be formatted using code for the "en es" layout using terms from the corresponding "en" locale, and items with "es" in the "language" field would be formatted using code for the "en es" layout, using terms from the corresponding "en" locale.

Correct? That is, the locale attribute specifies a list of affected language (tags)?

Once we start using tags like this, we also run into tag matching issues-- do we allow subtags to be used? Probably. How do we handle matching for them? That is, if I've specified:

<citation>
  <layout locale="ja-alalc97 en">
    ...
  </layout>
  <layout locale="ja">
    ...
  </layout>
  <layout>
    ...
  </layout>
</citation>

(which is reasonable, since the ALA-LC romanized version probably uses English-style conventions)

.. then is that legal? What if we have only partial matches between the language tags in the incoming data (say, zh) and the specified layouts (say, zh-TW, zh-CN)? I'm eager to start exploring and working this out, but as we know well, language tags are a messy world.

fbennett commented 13 years ago

No, that wouldn't work in the current implementation, but I'm pretty sure you wouldn't set things up that way in any case.

The value of locale is a CSL locale, possibly with a region modifier. So zh, zh-TW, zh-CN, de, de-DE and de-AT would all be valid. The BCP 47 tag ja-alalc97 would not (well, it would validate with the proposed patch, but the processor would ignore everything but the "ja"). So when I agreed we are looking at BCP 47 tag values, I misspoke.

Where a style wants things romanized, it should do so everywhere. If the terms of the primary language of the document (English, say) are to be adopted, then in MLZ you would request romanization in the document, and that's all there is to it. If the terms used in foreign language cites are to be those of the item language, then you could set an alternative layout locale with this syntax, and provide romanized versions of the terms (if appropriate) in the style itself. So long as they are applied uniformly for all items in that language within the style (which they certainly should be), there shouldn't be a problem.

denismaier commented 4 years ago

I think we should seriously re-consider this. I know it's an old issue but it gets requested time and again. @bwiernik @bdarcus

bdarcus commented 4 years ago

I don't really have a problem with the basic idea.

But we need a fully fleshed out proposal to evaluate. For the schema, what's the datatypes? For the spec, what's the specific language?

Also, if we allow multiple layout by language, are there other use cases for multiple layouts.

denismaier commented 4 years ago

This is what the CSL-M supplement has to say on this:

The MLZ extensions to CSL permit multiple cs:layout elements within cs:citation and cs:bibliography. Each cs:layout element but the last must include a locale attribute specifying one or more recognized CSL locales, and the final element must not carry a locale attribute. The locale applied to an item is determined by matching it against the locale set in the language variable of the item (this value is passed by Zotero). An example:

<citation>
  <layout locale="en es de">
      <text macro="layout-citation-roman"/>
  </layout>
  <layout locale="ru">
      <text macro="layout-citation-cyrillic"/>
  </layout>
  <layout>
      <text macro="layout-citation-ja"/>
  </layout>
</citation>

In the example above, an item with en, es or de (or de-AT) set in the language variable will be render by the layout-citation-roman macro, with locale terms set to the appropriate language.

See https://citeproc-js.readthedocs.io/en/latest/csl-m/#cs-layout-extension

Here's how it looks in csl-mlz.rnc: https://github.com/Juris-M/schema/blob/0d1c742ead025fedd08bafc317f223c56e0a9948/csl-mlz.rnc#L182

bdarcus commented 4 years ago

So on my datatype question, that's here (xsd:language):

layout.locale =
  attribute locale {
    list { xsd:language+ }
  }

I have no objection.

denismaier commented 4 years ago

Also, if we allow multiple layout by language, are there other use cases for multiple layouts.

Yes, there may be. Legal citations being one example, and Frank has solved this isolating the style legal components in distinct modules (sorry if I'm misrepresenting things here...). Another candidate might be classical citations, references to the Bible or other religious works, where typically other citation norms apply. (And they can mix. You can use Chicago author-date for "regular" citations, but a shorthand based system for the Talmud.) But I think that's outside the scope of this proposal.

bdarcus commented 4 years ago

To clarify, couldn't your example be handled currently using cs:choose?

denismaier commented 4 years ago

Sure, but you'll have to add that to every style. If we had a modular solution you could just hook that into any style.

denismaier commented 4 years ago

But concerning the proposal at stake here: You think we should add that as per CSL-M?

bdarcus commented 4 years ago

Sure, but you'll have to add that to every style. If we had a modular solution you could just hook that into any style.

This seems an important point, but I'm not understanding. Can you explain a bit more?

On your other question, curious what @bwiernik says.

denismaier commented 4 years ago

This seems an important point, but I'm not understanding. Can you explain a bit more?

A Jewish studies scholar might use different citation styles, e.g. Chicago author-date, Chicago note, MHRA, SBL. These styles don't necessarily support traditional sources. A journal might just say: "For these sources please use the system as described in the Encyclopedia Judaica." So you'll end up with this situation: You have a general style and another more specific style for certain items (usually using a lot of abbreviations, really hostile to writers and readers).

In Juris-M you can use the abbreviation filter to completely change the rendering of certain items.

(But again, I think this leads us a bit astray here. This goes way further than the original proposal and should perhaps be discussed on its own.)

bdarcus commented 4 years ago

What I'm asking, is what's the difference between this:

<layout language="es">
  ...
</layout>

... and this:

<layout>
  <choose>
    <if language="es">
      ...
    </if>
  </choose>
</layout>
denismaier commented 4 years ago

Oh, sorry, must have misunderstood your question...

The difference is that this

<layout>
  <choose>
    <if language="es">
      ...
    </if>
  </choose>
</layout>

won't localize terms. It just applies some conditional logic.

This

<layout language="es">
  ...
</layout>

re-runs the whole layout and applies a different locale.

bdarcus commented 4 years ago

So @language on cs:choose is about the content of the reference data, and @language as proposed here is about the target language of the output?

If that's the distinction, it makes sense to me, but then perhaps we should make that as explicit as possible by calling that attribute on layout something different; @target-language or @output-language?

Seems we need an equivalent on csl-citation.json?

denismaier commented 4 years ago

No. The language always comes from item metadata.

I was wrong concerning the distinction between

<layout>
  <choose>
    <if locale="es">
      ...
    </if>
  </choose>
</layout>

and

<layout locale="es">
  ...
</layout>

I think it's undocumented, but the variant with choose will also localize terms. I have just found it in a style I created a while ago. Perhaps @fbennett can provide us with some hints?

bwiernik commented 4 years ago

The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher) and, with the full CSLm multilingual support, to control which alphabets/transliterations/translations are used (e.g., a style might require that non-latin alphabets be shown in their original and transliterated forms).

Besides locales, the other major grouping is by item category (e.g., primary vs secondary sources). We've discussed that issue here. There, we settled on that being a calling-application API issue. I don't know that there is any need for specifying different layouts that couldn't be accommodated by <choose> (e.g., maybe to format modern versus historical newspaper articles?). Even if there were, I don't really see how that would be handled from CSL's end.

denismaier commented 4 years ago

The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher) and, with the full CSLm multilingual support, to control which alphabets/transliterations/translations are used (e.g., a style might require that non-latin alphabets be shown in their original and transliterated forms).

Besides locales, the other major grouping is by item category (e.g., primary vs secondary sources). We've discussed that issue here. There, we settled on that being a calling-application API issue. I don't know that there is any need for specifying different layouts that couldn't be accommodated by <choose> (e.g., maybe to format modern versus historical newspaper articles?). Even if there were, I don't really see how that would be handled from CSL's end.

Thanks for the clarifications.

So what is your assessment of that proposal. Should we add this? I think yes. (Even if I despise these kind of requirements...)

odomanov commented 1 year ago

What is the current status of this? The feature is really long awaited.

denismaier commented 1 year ago

@odomanov The status is the same as before. Still under discussion. Sorry ...

odomanov commented 1 year ago

The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher)

Do we really need multiple layouts for that? Presumably, each item has the language field which should be enough to localize terms even in the case of a single layout. Why do we need many layouts for terms localization?

odomanov commented 1 year ago

I mean there are localization files for every language. Perhaps they should contain all the information necessary for localization -- first of all translations but maybe also format for dates and similar things. Do we really need layouts for that?

denismaier commented 1 year ago

As per @jgm's comment, the purpose of multiple layouts is to make this configurable.

The problem is that this is style-dependent. Some styles would want this, others not. And some might want it only for specific languages or with specific limitations.

So, no. Unfortunately, it wouldn't be helpful to just localize all the items on the basis of the language field. You'll need some way or another to make this configurable.

odomanov commented 1 year ago

I agree that this should be configurable. What about a configuration like that:

<layout locale="per-item"> 
...
</layout>

In this way the style's author can choose how the terms are localized.

It's maybe even better to have a special option in "Citation-specific Options" and "Bibliography-specific Options". It's a matter of discussion.

(I'm sorry for pushing this. I like CSL and I want it to be multilingual. The CSL-M solution is too complicated in my view. Multiple layouts is a very useful feature, for example, in Classics. But the localization of terms is a different thing. It should work even with a single layout. Why not?)

bdarcus commented 1 year ago

... the localization of terms is a different thing. It should work even with a single layout. Why not?

I think you're right, @odomanov, but just wanted to confirm my thoughts.

In case anyone is still following this thread, there's a linked issue here where we're discussing localization a bit for an experimental project and I wanted to confirm something:

Let's imagine we don't have any current restrictions of CSL 1.0. In my experiment, for example, the conditional is allowed to structure citation or bibliography formatting by locale.

Is it not the case that, per what @odomanov is saying above, what term set to use is a separate matter from how to layout content from a particular locale?

E.g. there's actually no need for multiple cs:layout elements, and that locale="per-item" attribute he suggests could either go on cs:citation, cs:bibliography, or even be global (cs:style)?

And the default value for the behavior would, in fact, be "global" (all entries use the same term locale)?

I don't do multilingual, so I struggle with the details sometimes, so would be good if someone can confirm that!

denismaier commented 1 year ago

I don't do multilingual, so I struggle with the details sometimes, so would be good if someone can confirm that!

The problem with multilingual formatting is basically this: you can approach it from a variety of different angles.

  1. Assume the layout stays the same across locales, but you still want to localize terms differently. For this, an attribute like locale="per-item" might work. (However, you might also want to indicate which locales to take into account; say, English, German, and French get localized according to item language, everything else falls back to the locale of the document or the citation style. In this case, a single attribute might be not enough, but some sort of configuration element might do.)
  2. Don't localize terms, but take the locale of an item into account for things like title-casing or hyphenation. The title casing issue is already covered by CSL 1, but we currently only have one locale per item, e.g., it's impossible to have an title in English, and the container-title in French.
  3. Take variable variants into account. How should those be rendered? For which variable should they be taken into account? How should take play together with styles, i.e., do you need special styles that handle this, or can you hook some sort of special configuration into existing styles.
  4. Then, there's a full-blown multi-layout solution. This is probably most useful (only really necessary?) when items in different scripts should have their distinct layout, and where using field variants for single variables wouldn't be enough.
bdarcus commented 1 year ago

That's really useful @denismaier!

Is it fair to say that it's still useful to distinguish features which could be configured with simple, probably global, parameters?

Clearly 4 is not that.

denismaier commented 1 year ago

Is it fair to say that it's still useful to distinguish features which could be configured with simple, probably global, parameters?

Clearly 4 is not that.

Yes, and yes.