Open GoogleCodeExporter opened 9 years ago
There is a recommended best pratice in the DCMI's spec :
"""Comment:Recommended best practice is to use a controlled vocabulary such as
RFC
4646 [RFC4646]."""
[http://dublincore.org/documents/dcmi-terms/#terms-language]
It might be more natural to follow this recommendation
Original comment by zeta....@gmail.com
on 26 May 2010 at 10:19
Implementors working in this area need to be aware of a long-standing bug in
libxml2
which affects validation of many minority languages:
https://bugzilla.gnome.org/show_bug.cgi?id=606592
Original comment by syea...@gmail.com
on 26 May 2010 at 10:32
"recommended best practice is to use a controlled vocabulary such as RFC
4646" means that people *may* use *some* controlled vocabulary *for example*
RFC 4646
or its variations or something else that they believe to be a controlled
vocabulary
(I one catalog always used the same numbers to identify languages it is also a
controlled vocabulary). This does not help at all - either you define something
more
specific or clients must use heuristics to guess the language(s). Forthermore
it is
not clear whether multiple languages in one element are allowed or you must
repeat
the dc:language element - I bet you can do both and that is the way it will end
up:
everyone uses his one little variation.
I propose the require that the value of dc:language must conform to RFC 3066
which is
a regular expression:
* LanguageID ::= Langcode ('-' Subcode)*
* Langcode ::= ISO639Code | IanaCode | UserCode
* ISO639Code ::= ([a-z] | [A-Z]) ([a-z] | [A-Z])
* IanaCode ::= ('i' | 'I') '-' ([a-z] | [A-Z])+
* UserCode ::= ('x' | 'X') '-' ([a-z] | [A-Z])+
* Subcode ::= ([a-z] | [A-Z])+
Original comment by siehea...@googlemail.com
on 27 May 2010 at 7:28
Sounds reasonable. We could offer similar advices for atom:category/dc:subject
(recommend using BISAC, LOC etc.) although it won't be mandatory in that case.
Original comment by hadrien....@gmail.com
on 27 May 2010 at 7:31
I agree with you, on having a required controlled vocabulary.
As for the RFC number why are you advising for RFC 3066 ? IETF website seem to
say
that RFC 4646 obsolete RFC 3066 (look at the header here:
http://www.ietf.org/rfc/rfc4646.txt)?
You seem to be much more knowledgeable than me of the subject, so is there any
issue
with going with RFC 4646 ?
Original comment by zeta....@gmail.com
on 27 May 2010 at 7:48
sieheauch's regexp appears to reproduce the libxml2 bug in excluding
three-letter
character codes.
Several of the ePubs from http://www.nzetc.org/ contain fragments in languages
which
have only three letter codes (mainly Pacifika languages). These are languages
that
people actually speak and people actaully care about.
Original comment by syea...@gmail.com
on 27 May 2010 at 9:24
What you really need is a way to advertise WHICH controlled vocabulary you are
using.
While for language, the openpub standard could insist upon RFC 4646, and that
might
likely satisfy nearly everyone -- for subject vocabularies there is unlikely to
be
one universal category meeting all needs, it's really going to be neccesary to
provide a way to advertise exactly what vocabulary you are using in a
machine-understandable way. And this doesn't hurt for language too, although
OpenPub
could "strongly recommend" one particular one like RFC 4646.
Unfortunately, dc:language and dc:subject don't give you an obvious easy way to
do
this, they might need to be 'extended'. Perhaps with an attribute 'vocabulary'
which
must contain a URI identifiying a vocabulary (and to guard against the danger
that
people will choose different URIs for the same vocabulary, provide the
recommended
URIs for certain common vocabularies like RFC 4646).
I'm not sure exactly what it takes to legally extend a dc:* element with an
attribute, this has always confused me.
Original comment by rochk...@jhu.edu
on 27 May 2010 at 10:59
For atom:category we can rely on the scheme attribute. For dc:language...
xsi:type ?
Original comment by hadrien....@gmail.com
on 27 May 2010 at 11:16
Original comment by hadrien....@gmail.com
on 15 Jul 2010 at 4:26
Original issue reported on code.google.com by
siehea...@googlemail.com
on 26 May 2010 at 8:57