Open georgd opened 3 years ago
Ah, good point. I can get the strings from the locale files---unless there are variants not included there that might be used. For "to" ... yes, a list of those will be very useful! As for hyphens ... yeah, their ambiguity is a problem. In the latest release, I completely disabled parsing of hyphens, so they will be passed through literally and won't trigger a plural "number" label, for that very reason (after a lawyer in the U.S. complained about the use of backslash escapes). Unless they are set off with spaces or something, records that join multiple docket numbers with a hyphen just won't work correctly, I'm afraid.
Ok, I looked through the languages on curia.eu and copied the strings:
"and" (deduplicated):
a
and
e
en
et
i
in
ir
ja
och
og
oraz
u
un
und
y
és
și
και
и
"to" (deduplicated):
-
a
až
bis
do
līdz
sa
to
tot en met
à
έως
‑
–
—
As you can see, four hyphen symbols are used, HYPHEN-MINUS (U+002D), HYPHEN (U+2010), en-dash (U+2013) and em-dash (U+2014). But I don’t think, that would make a difference.
Regarding the hyphens and dashes: you should be able to escape them. They may appear without denoting a range.
https://github.com/Juris-M/citeproc-js/commit/3bd36ed3229b6743ace8b1622da2098bbaa2371a made
is-multiple
smarter, but also language dependent (depends on English, of course). So, should this be expanded to all the other 23 languages? I could compile a list ofand
strings.One more case is still neglected: if more than two consecutive case numbers are joined, you’ll find them cited like
C‑457/11 to C‑460/11
. So, theto
strings should be considered as well — I can provide them, as well. But this might not work for languages that use a hyphen instead of ato
string (judging from curia.eu, some seem to do so): testing for a hyphen will always return true, as all EU case no.s have a hyphen.