Open fbennett opened 13 years ago
A patch covering this proposal is available for preview here [link updated].
while items with "en" in the "language" field would be formatted using code for the "en" layout in the corresponding "en" locale
Should that be 'with "en" or "es"'? And at this point do we point to BCP 47 and say that BCP 47 language tags are what we're talking about?
The full spec description will need to refer to BCP 47, for sure. In the example, that should read `formatted using the code for the "en es" layout in the corresponding "en" locale', to be more precise. But it's right that the "en" locale would be applied.
For this item, I'll treat the first entry as a document, and make revisions to the proposal in response to comments and discussion.
As I mentioned to Frank off-list, this solution has a few constraints, in that attributes that are specific to cs:style, cs:citation and cs:bibliography cannot have separate values for the separate locales (initialize-with-hyphen, demote-non-dropping-particle, page-range-format, the disambiguation and collapsing options (on cs:citation), near-note-distance, subsequent-author-substitute and some white-space options for cs:bibliography). That might not be a deal-breaker, though.
As I mentioned to Frank off-list, this solution has a few constraints, in that attributes that are specific to cs:style, cs:citation and cs:bibliography cannot have separate values for the separate locales (initialize-with-hyphen, demote-non-dropping-particle, page-range-format, the disambiguation and collapsing options (on cs:citation), near-note-distance, subsequent-author-substitute and some white-space options for cs:bibliography)
Those limitations shouldn't be a problem. If there is demand for any of those options, they can be allowed on the non-default cs:layout elements together with the locale.
So we're saying that if locale is specified at all in cs:layout, then language-specific term usage will be automatically activated? And otherwise a single locale's terms will be applied, which is the current behavior.
And just to be 100% clear, you mean: [..] while items with "en" in the "language" field would be formatted using code for the "en es" layout using terms from the corresponding "en" locale, and items with "es" in the "language" field would be formatted using code for the "en es" layout, using terms from the corresponding "en" locale.
Correct? That is, the locale attribute specifies a list of affected language (tags)?
Once we start using tags like this, we also run into tag matching issues-- do we allow subtags to be used? Probably. How do we handle matching for them? That is, if I've specified:
<citation>
<layout locale="ja-alalc97 en">
...
</layout>
<layout locale="ja">
...
</layout>
<layout>
...
</layout>
</citation>
(which is reasonable, since the ALA-LC romanized version probably uses English-style conventions)
.. then is that legal? What if we have only partial matches between the language tags in the incoming data (say, zh) and the specified layouts (say, zh-TW, zh-CN)? I'm eager to start exploring and working this out, but as we know well, language tags are a messy world.
No, that wouldn't work in the current implementation, but I'm pretty sure you wouldn't set things up that way in any case.
The value of locale is a CSL locale, possibly with a region modifier. So zh, zh-TW, zh-CN, de, de-DE and de-AT would all be valid. The BCP 47 tag ja-alalc97 would not (well, it would validate with the proposed patch, but the processor would ignore everything but the "ja"). So when I agreed we are looking at BCP 47 tag values, I misspoke.
Where a style wants things romanized, it should do so everywhere. If the terms of the primary language of the document (English, say) are to be adopted, then in MLZ you would request romanization in the document, and that's all there is to it. If the terms used in foreign language cites are to be those of the item language, then you could set an alternative layout locale with this syntax, and provide romanized versions of the terms (if appropriate) in the style itself. So long as they are applied uniformly for all items in that language within the style (which they certainly should be), there shouldn't be a problem.
I think we should seriously re-consider this. I know it's an old issue but it gets requested time and again. @bwiernik @bdarcus
I don't really have a problem with the basic idea.
But we need a fully fleshed out proposal to evaluate. For the schema, what's the datatypes? For the spec, what's the specific language?
Also, if we allow multiple layout by language, are there other use cases for multiple layouts.
This is what the CSL-M supplement has to say on this:
The MLZ extensions to CSL permit multiple cs:layout
elements within cs:citation
and cs:bibliography
. Each cs:layout
element but the last must include a locale attribute specifying one or more recognized CSL locales, and the final element must not carry a locale attribute. The locale applied to an item is determined by matching it against the locale set in the language
variable of the item (this value is passed by Zotero). An example:
<citation>
<layout locale="en es de">
<text macro="layout-citation-roman"/>
</layout>
<layout locale="ru">
<text macro="layout-citation-cyrillic"/>
</layout>
<layout>
<text macro="layout-citation-ja"/>
</layout>
</citation>
In the example above, an item with en
, es
or de
(or de-AT
) set in the language variable will be render by the layout-citation-roman
macro, with locale terms set to the appropriate language.
See https://citeproc-js.readthedocs.io/en/latest/csl-m/#cs-layout-extension
Here's how it looks in csl-mlz.rnc: https://github.com/Juris-M/schema/blob/0d1c742ead025fedd08bafc317f223c56e0a9948/csl-mlz.rnc#L182
So on my datatype question, that's here (xsd:language):
layout.locale =
attribute locale {
list { xsd:language+ }
}
I have no objection.
Also, if we allow multiple layout by language, are there other use cases for multiple layouts.
Yes, there may be. Legal citations being one example, and Frank has solved this isolating the style legal components in distinct modules (sorry if I'm misrepresenting things here...). Another candidate might be classical citations, references to the Bible or other religious works, where typically other citation norms apply. (And they can mix. You can use Chicago author-date for "regular" citations, but a shorthand based system for the Talmud.) But I think that's outside the scope of this proposal.
To clarify, couldn't your example be handled currently using cs:choose
?
Sure, but you'll have to add that to every style. If we had a modular solution you could just hook that into any style.
But concerning the proposal at stake here: You think we should add that as per CSL-M?
Sure, but you'll have to add that to every style. If we had a modular solution you could just hook that into any style.
This seems an important point, but I'm not understanding. Can you explain a bit more?
On your other question, curious what @bwiernik says.
This seems an important point, but I'm not understanding. Can you explain a bit more?
A Jewish studies scholar might use different citation styles, e.g. Chicago author-date, Chicago note, MHRA, SBL. These styles don't necessarily support traditional sources. A journal might just say: "For these sources please use the system as described in the Encyclopedia Judaica." So you'll end up with this situation: You have a general style and another more specific style for certain items (usually using a lot of abbreviations, really hostile to writers and readers).
In Juris-M you can use the abbreviation filter to completely change the rendering of certain items.
(But again, I think this leads us a bit astray here. This goes way further than the original proposal and should perhaps be discussed on its own.)
What I'm asking, is what's the difference between this:
<layout language="es">
...
</layout>
... and this:
<layout>
<choose>
<if language="es">
...
</if>
</choose>
</layout>
Oh, sorry, must have misunderstood your question...
The difference is that this
<layout>
<choose>
<if language="es">
...
</if>
</choose>
</layout>
won't localize terms. It just applies some conditional logic.
This
<layout language="es">
...
</layout>
re-runs the whole layout and applies a different locale.
So @language
on cs:choose
is about the content of the reference data, and @language
as proposed here is about the target language of the output?
If that's the distinction, it makes sense to me, but then perhaps we should make that as explicit as possible by calling that attribute on layout something different; @target-language
or @output-language
?
Seems we need an equivalent on csl-citation.json
?
No. The language always comes from item metadata.
I was wrong concerning the distinction between
<layout>
<choose>
<if locale="es">
...
</if>
</choose>
</layout>
and
<layout locale="es">
...
</layout>
I think it's undocumented, but the variant with choose
will also localize terms. I have just found it in a style I created a while ago. Perhaps @fbennett can provide us with some hints?
The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher) and, with the full CSLm multilingual support, to control which alphabets/transliterations/translations are used (e.g., a style might require that non-latin alphabets be shown in their original and transliterated forms).
Besides locales, the other major grouping is by item category (e.g., primary vs secondary sources). We've discussed that issue here. There, we settled on that being a calling-application API issue. I don't know that there is any need for specifying different layouts that couldn't be accommodated by <choose>
(e.g., maybe to format modern versus historical newspaper articles?). Even if there were, I don't really see how that would be handled from CSL's end.
The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher) and, with the full CSLm multilingual support, to control which alphabets/transliterations/translations are used (e.g., a style might require that non-latin alphabets be shown in their original and transliterated forms).
Besides locales, the other major grouping is by item category (e.g., primary vs secondary sources). We've discussed that issue here. There, we settled on that being a calling-application API issue. I don't know that there is any need for specifying different layouts that couldn't be accommodated by
<choose>
(e.g., maybe to format modern versus historical newspaper articles?). Even if there were, I don't really see how that would be handled from CSL's end.
Thanks for the clarifications.
So what is your assessment of that proposal. Should we add this? I think yes. (Even if I despise these kind of requirements...)
What is the current status of this? The feature is really long awaited.
@odomanov The status is the same as before. Still under discussion. Sorry ...
The point of the different locale layouts is to localize the terms by item language (e.g., German items are formatted in German, English items in Englisher)
Do we really need multiple layouts for that? Presumably, each item has the language
field which should be enough to localize terms even in the case of a single layout. Why do we need many layouts for terms localization?
I mean there are localization files for every language. Perhaps they should contain all the information necessary for localization -- first of all translations but maybe also format for dates and similar things. Do we really need layouts for that?
As per @jgm's comment, the purpose of multiple layouts is to make this configurable.
The problem is that this is style-dependent. Some styles would want this, others not. And some might want it only for specific languages or with specific limitations.
So, no. Unfortunately, it wouldn't be helpful to just localize all the items on the basis of the language
field. You'll need some way or another to make this configurable.
I agree that this should be configurable. What about a configuration like that:
<layout locale="per-item">
...
</layout>
In this way the style's author can choose how the terms are localized.
It's maybe even better to have a special option in "Citation-specific Options" and "Bibliography-specific Options". It's a matter of discussion.
(I'm sorry for pushing this. I like CSL and I want it to be multilingual. The CSL-M solution is too complicated in my view. Multiple layouts is a very useful feature, for example, in Classics. But the localization of terms is a different thing. It should work even with a single layout. Why not?)
... the localization of terms is a different thing. It should work even with a single layout. Why not?
I think you're right, @odomanov, but just wanted to confirm my thoughts.
In case anyone is still following this thread, there's a linked issue here where we're discussing localization a bit for an experimental project and I wanted to confirm something:
Let's imagine we don't have any current restrictions of CSL 1.0. In my experiment, for example, the conditional is allowed to structure citation or bibliography formatting by locale.
Is it not the case that, per what @odomanov is saying above, what term set to use is a separate matter from how to layout content from a particular locale?
E.g. there's actually no need for multiple cs:layout
elements, and that locale="per-item"
attribute he suggests could either go on cs:citation
, cs:bibliography
, or even be global (cs:style
)?
And the default value for the behavior would, in fact, be "global" (all entries use the same term locale)?
I don't do multilingual, so I struggle with the details sometimes, so would be good if someone can confirm that!
I don't do multilingual, so I struggle with the details sometimes, so would be good if someone can confirm that!
The problem with multilingual formatting is basically this: you can approach it from a variety of different angles.
locale="per-item"
might work. (However, you might also want to indicate which locales to take into account; say, English, German, and French get localized according to item language, everything else falls back to the locale of the document or the citation style. In this case, a single attribute might be not enough, but some sort of configuration
element might do.)title
in English, and the container-title
in French.That's really useful @denismaier!
Is it fair to say that it's still useful to distinguish features which could be configured with simple, probably global, parameters?
Clearly 4 is not that.
Is it fair to say that it's still useful to distinguish features which could be configured with simple, probably global, parameters?
Clearly 4 is not that.
Yes, and yes.
In non-English publishing, foreign references are often provided in the native script, and follow the formatting conventions of the language realm from which they originate. To correctly handle such styles, it is sufficient to switch to the target locale when rendering references from the target language. The "language" variable on individual items can be used for this purpose, together with locale-specific layouts set with the following syntax:
With a default locale of "ru", this would format items with no value or "ru" in the "language" field using the code for the default layout in the "ru" locale, while items with "en" in the "language" field would be formatted using code for the "en es" layout in the corresponding "en" locale.