SEMICeu / Core-Person-Vocabulary

This is the issue tracker for the maintenance of Core Person Vocabulary
15 stars 4 forks source link

Character set (syntactical interoperability) #26

Closed dimi-schepers closed 1 year ago

dimi-schepers commented 3 years ago

During the Core Vocs webinar dd. 2021-04-23, a proposition was made to also specify the character set that should be used. In Germany, Latin is used – with a subset of the 6400 characters – but they would also like to see the possibility to indicate other character sets (e.g. Bulgarian) so that syntactical interoperability is also covered. "What about restricting the possible characters to the legally binding transliteration agreements?"

It was argued that if the text would be defined as UTF8, this would maybe be easier instead of defining the character set. It was clarified that the question is not about character encoding as in UTF8, it is given that the encoding is UTF-8. The proposition is about specifying, in addition to language, also the set of letters (e.g. Chinese traditional vs Chinese simplified - both Chinese, but with different character sets, both technically included in UTF-8).

A German standardisation initiative called DIN SPEC 91379 could serve as inspiration: https://www.din.de/de/wdc-beuth:din21:301228458.

It was added that such discussions and decisions should also reflect the vision of the world and therefore should be spread outside of the IT and semantic communities. It was mentioned that it could be important to look at what the social changes are before hard coding these aspects in a specification. This is also valid for the gender discussion. A link from W3C regarding different naming conventions around the world was shared: https://www.w3.org/International/questions/qa-personal-names.en

frank-steimke commented 2 years ago

I'd like to add some remarks about the DIN 91379 character set

EmidioStani commented 2 years ago

This issue relates to character set in a Document In XML world the charset at the top level of the XML document: <?xml version="1.0" encoding="UTF-8"?> In HTML the charset can be defined inside the html document:

Thus all the content of such document will follow the character encoding.

There could be 2 ways: 1) adding a property at the foaf:Document class indicating the charset 2) including a relation between foaf:Document and cnt:Content classes (where cnt:Content class can be found in https://www.w3.org/TR/Content-in-RDF10/#ContentClass)

EmidioStani commented 1 year ago

As there is no document associated in Core Person, this issue could be closed, however implementers of CCCEV, when implementing an Evidence as a document, could consider to implementing as suggested above