Open VladimirAlexiev opened 9 years ago
This is critical, because we want to fix 10-15 lang-specific props to foaf:name with lang tag: http://mappings.dbpedia.org/index.php/What%27s_in_a_Name#Language-specific_Names
Another interesting lang tag is "qqq-DZ" (meaning "language used in specific region: Algeria") in http://mappings.dbpedia.org/index.php?title=Mapping_fr:Infobox_Commune_d'Algérie&action=edit
I now see http://mappings.dbpedia.org/index.php/Template:PropertyMapping says: "we can define the language tag using the wikipedia language code".
But you should accept IANA lang tags not wikipedia codes, since the lang of a wikipedia does not limit the lang strings that it can contain. Eg frwiki talks about names in Serbian cyrillic (sr-Cyrl), Gagauz (gag), Algerian (which is not a single lang, ergo qqq-DZ) etc.
This is a nice addition but not sure what it might break in the framework. @jcsahnwaldt any ideas? There are some comments in the file [1] probably by you
@jimregan: on first glance, we need to add to nonIsoCodes at https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/util/Language.scala#L100 each of the language codes we dealth with at https://github.com/dbpedia/mappings-tracker/issues/15
But I'm not sure what are these codes used for:
Ok, well that mapping needs to go. And never be mentioned again!
There are at least two problems with the current system:
So, even if you set language="gag" in the mapping, it will end up in the triples with an @ tr language tag, which may not be what you expected...
template: http://mappings.dbpedia.org/index.php/Template:PropertyMapping says:
property: http://mappings.dbpedia.org/index.php/OntologyProperty:Foaf:name
mapping: http://mappings.dbpedia.org/index.php?title=Mapping_fr:Infobox_Ville_de_Serbie&action=edit has
wiki page: https://fr.wikipedia.org/w/index.php?title=Požega_(Serbie)&action=edit has
result: http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples&extractors=custom
Maybe the dataprop extractor has the wrong idea what can a lang tag be? That above is a valid lang tag meaning "lang=Serbian, script=Cyrillic"