is-multiple is still not smart enough

Juris-M / citeproc-js

A JavaScript implementation of the Citation Style Language (CSL) https://citeproc-js.readthedocs.io

Other

305 stars 85 forks source link

is-multiple is still not smart enough #174

Open georgd opened 3 years ago

georgd commented 3 years ago

https://github.com/Juris-M/citeproc-js/commit/3bd36ed3229b6743ace8b1622da2098bbaa2371a made is-multiple smarter, but also language dependent (depends on English, of course). So, should this be expanded to all the other 23 languages? I could compile a list of and strings.

One more case is still neglected: if more than two consecutive case numbers are joined, you’ll find them cited like C‑457/11 to C‑460/11. So, the to strings should be considered as well — I can provide them, as well. But this might not work for languages that use a hyphen instead of a to string (judging from curia.eu, some seem to do so): testing for a hyphen will always return true, as all EU case no.s have a hyphen.

fbennett commented 3 years ago

Ah, good point. I can get the strings from the locale files---unless there are variants not included there that might be used. For "to" ... yes, a list of those will be very useful! As for hyphens ... yeah, their ambiguity is a problem. In the latest release, I completely disabled parsing of hyphens, so they will be passed through literally and won't trigger a plural "number" label, for that very reason (after a lawyer in the U.S. complained about the use of backslash escapes). Unless they are set off with spaces or something, records that join multiple docket numbers with a hyphen just won't work correctly, I'm afraid.

georgd commented 3 years ago

Ok, I looked through the languages on curia.eu and copied the strings:

"and" (deduplicated):

a
and
e
en
et
i
in
ir
ja
och
og
oraz
u
un
und
y
és
și
και
и

"to" (deduplicated):

-
a
až
bis
do
līdz
sa
to
tot en met
à
έως
‑
–
—

As you can see, four hyphen symbols are used, HYPHEN-MINUS (U+002D), HYPHEN (U+2010), en-dash (U+2013) and em-dash (U+2014). But I don’t think, that would make a difference.

denismaier commented 3 years ago

Regarding the hyphens and dashes: you should be able to escape them. They may appear without denoting a range.