dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
856 stars 269 forks source link

handle multilingual strings with templates like {{en|...}} #311

Open VladimirAlexiev opened 9 years ago

VladimirAlexiev commented 9 years ago

https://commons.wikimedia.org/w/index.php?title=File:Hristogdanov.jpg&action=edit has description in two languages:

| description        = {{bg|Христо Г. Данов}} {{en|Hristo G. Danov}}

which would map perfectly to

  dct:description "Христо Г. Данов"@bg, "Hristo G. Danov"@en.

Does the literal extractor handle this? Does the prop range have to be rdf:langString or not (better)?

VladimirAlexiev commented 9 years ago

I don't think it does: http://mappings.dbpedia.org/index.php/Mapping_commons:BASA-image has mapped description , but there's no dct:description in the output: http://mappings.dbpedia.org/server/extraction/commons/extract?revid=&format=turtle-triples&extractors=custom&title=File:Hristogdanov.jpg

VladimirAlexiev commented 9 years ago

Another example is https://commons.wikimedia.org/w/index.php?title=File:Христо_Ботев.jpg&action=edit&section=1, which also uses some numbers in the form "N=..."

|Description={{en|1=Hristo Botev}}
{{ru|1=Ботев, Христо}}
Nono314 commented 9 years ago

Templates like {{en|...}} are pretty specific to Commons.

On the language wikis, the ubiquitous template is {{lang|xx|...}} with its variant {{rtl-langlxx|...}} ant its specific forms {{lang-xx|...}}}.

See Multilingua support templates.