Closed denismaier closed 4 years ago
What are the citeproc current behaviors? I've not heard any complaints about this, so whatever the current behavior is would be correct I'd think.
Current behaviour in citeproc-js seems to be that with "uppercase_subtitles": false
dashes are not touched at all. With "uppercase_subtitles": true
a couple of combinations get normalized, but all to em dashes, en dashes aren't recognized as relevant punctuation, i.e., no conversion to em dashes and no uppercasing afterwards. (Note that two hyphens are converted to em dashes, which is a bit against most plain text conventions, I think.) Locales don't seem to have any effect.
Here are some text cases:
>>===== MODE =====>>
citation
<<===== MODE =====<<
>>===== RESULT =====>>
Title input with em dash—Should start with uppercase
Title input with space-em-dash-space—Should normalize and start with uppercase
Title input with space-hyphen-space—Should normalize and start with uppercase
Title input with space-double-hyphen-space—Should normalize and start with uppercase
Title input with space-triple-hyphen-space—Should normalize and start with uppercase
Title input with double-hyphen—Should normalize and start with uppercase
Title input with triple-hyphen—Should normalize and start with uppercase
Title input with space-endash-space – Should keep endash and start with uppercase
<<===== RESULT =====<<
>>===== CITATION-ITEMS =====>>
[
[
{
"id": "ITEM-1"
}
],
[
{
"id": "ITEM-2"
}
],
[
{
"id": "ITEM-3"
}
],
[
{
"id": "ITEM-4"
}
],
[
{
"id": "ITEM-5"
}
],
[
{
"id": "ITEM-6"
}
],
[
{
"id": "ITEM-7"
}
],
[
{
"id": "ITEM-8"
}
]
]
<<===== CITATION-ITEMS =====<<
>>===== OPTIONS =====>>
{
"uppercase_subtitles": false
}
<<===== OPTIONS =====<<
>>===== CSL =====>>
<style
xmlns="http://purl.org/net/xbiblio/csl"
class="note"
version="1.0"
default-locale="en">
<info>
<id />
<title />
<updated>2009-08-10T04:49:00+09:00</updated>
</info>
<citation>
<layout delimiter="; ">
<text variable="container-title"/>
</layout>
</citation>
</style>
<<===== CSL =====<<
>>===== INPUT =====>>
[
{
"id": "ITEM-1",
"container-title": "Title input with em dash—should start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-2",
"container-title": "Title input with space-em-dash-space — should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-3",
"container-title": "Title input with space-hyphen-space - should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-4",
"container-title": "Title input with space-double-hyphen-space -- should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-5",
"container-title": "Title input with space-triple-hyphen-space --- should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-6",
"container-title": "Title input with double-hyphen--should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-7",
"container-title": "Title input with triple-hyphen---should normalize and start with uppercase",
"type": "article-journal"
},
{
"id": "ITEM-8",
"container-title": "Title input with space-endash-space – should keep endash and start with uppercase",
"type": "article-journal"
}
]
<<===== INPUT =====<<
After thinking a bit more about this, I tend to think that dashes shouldn't be normalized unless they are delimiters between title and and subtitle or between multiple subtitles. (Converting -- to en dash and --- to em dash is a different thing. We should certainly do this. Not sure about single hyphens...)
Thinking through this more, this is a locale-dependent dependent thing (e.g., Bristish English and German generally prefer space-en dash-space instead of em dash in text). I'm not sure whether that's something we would need to bother with?
I think that a single space-hyphen-space should probably be normalized to em dash or space-en dash-space.
So that would mean: space-hypen-space => em dash or space-en dash-space (depending on locale?) hyphen-hypen => en dash hypen-hyphen-hypen => em dash
As said above, I don't think we should normalize dashes unless a dash is a delimiter between title and subtitle, and normalize-title-delimiters
is set to "full"---but then we will most likely normalize to a colon or a period (not to a dash).
Yeah, let's leave it to publishers to normalize dashes if they want that.
Looking at my library of items, almost all -
are low-quality metadata imports that should properly be colons separating subtitles. Just a few are German publications that should be en dashes.
For simplicity, I think let's just leave single hyphens alone.
Ok, good. So that gives us:
hyphen-hypen => en dash hypen-hyphen-hypen => em dash
So we don't a schema change. A paragraph in the specs (aimed at processor implementors) will be enough. I'll close here and open a new issue there.
In the issue about splitting
title-main
andtitle-sub
, one question was about em vs en dashes. Should we add an option to normalize dashes in a textual context as we already change hyphens to en-dashes in a numerical context (we do that, right?). I guess that should, most likely, be locale depended, e.g., em dashes for US English, en dashes for most other locales?