Open Nono314 opened 9 years ago
At the moment the enabled languages are limited to the available Wikipedia languages [1]. This is a feature request with limited impact but we welcome any contributions
Sure I may contribute, that's the idea :-)
Just after you introduced langstrings, #311 and this one where my first two thoughts (in fact #311 would be enough if template languages were used wherever possible). I think both may have a larger impact than the current constant language value added to some property mappings.
The fact that the same class is used in the framework to represent a full wiki instance and also for a simple language tag is a bit odd, but not really relevant here. In fact for alternate title of works as examplified above, we have a pretty good chance to deal mostly with quite common languages. While at the same time, many of the uses for the current constant language parameter involve regional ones, which need being added to the internal language map, just to be turned back into their main variants before emitting the triples...
Would you object to an additional parameter in SimplePropertyMapping called template language property? I think it's better than recycling language with a brand new dereferencing mechanism (something like language="{property_name}").
Do you mean #303? I don't see #331 related to this.
I also cannot see the connection of language="{property_name}"
here, can
you provide a more detailed example?
Thanks!
sorry I see what you mean but it is hard to correctly configure this behavior. Any suggestions?
I actually meant #311, sorry (fixed my previous comment).
What do you mean by configuring?
I think once you have the parameter for the "language" template property, you retrieve it's value. When it's a supported language code you just use it as you currently do with the constant value passed, otherwise you need a localized map similar to the one for the country flag parser but in reverse (label => code).
For English, this map can even be automatically generated from the iso codes in the language templates. Leveraging them, we can get something like this out of the Book template. And this can be extended to other languages too by using interwiki links. See here for frwiki. (no federated query on DBpedia endpoint, so I can't link this with book titles).
I guess I can start something and see what you think about it.
This is already a lot of configuration :)
The problem here is that these templates are embedded in other templates most of the time and there is no value in mapping them directly (unless I am missing something). related to #341
My approach would be to unwrap them in [1] as text nodes with a '
'
suffix . This would already provide the data to the mappings extractor or
other extractrors as simple text.
The problem is then for getting the language. For this we'd need to
refactor TextNode and add a language property that we can use when
extracting data
I never said there was not! :) I was just wondering what you were expecting exactly, so I tried to be quite exhaustive.
When you're talking about templates, they're those from #311, right? Of course they're embedded and do not need to be mapped (on the wiki), they just need to be handled by the parser, just as the DateTimeParser handles date templates or the UnitValueParser handles conversion templates.
I also first thought of simply unwrapping them in TemplateTransform so that their content would be extracted, but then turned to composing with StringParser and return an optional Language. I did not envision adding lang info to TextNode, which sounds interesting though and would allow for finer grained handling in the future.
I'm not a big fan of TemplateTransformConfig which is a kind of catch all. Almost every template could be handled here by expanding it: for example https://github.com/dbpedia/mappings-tracker/issues/46 could be solved by "unwrapping" it to several standard date templates along with their location text, each separated by a <br/> text node, that would then parse gracefully. I'd better limit that to a few basic cases such as {{nowrap|}} / {{nobr|}} or {{small|}} that also still shield their content from parsing.
I 'd choose the unwrapping way but I don't mind, we could also do it like the FlagTemplateParser if you're are willing to work on that.
@Nono314 > the country flag parser
What's that? Sounds awfully interesting, guess this way one can get nationality of sportsmen and many similar goodies. Can you document it in the mapping wiki?
@VladimirAlexiev it's actually called the FlagTemplateParser. Yes, it aims at leveraging that habit of Wikipedians to use flags wherever there's the notion of nationality. But there are so many ways to do that, it's not an easy task. And looking closer at it, I've found it to be even more flawed than I thought (see #360).
@jimkont I'll consider the different options. I still have a few other things ongoing that I want to complete first. I thought I had some time ahead this year before the mapping sprint but obviously I was wrong :(
https://bg.wikipedia.org/wiki/Шаблон:Геообект uses complicated rules to determine the language. Examples given there:
param1 | param2 | lang link | lang tag |
---|---|---|---|
език1 = англосаксонски | език1-връзка = английски език | [[английски език]] | en |
език2 = [[шопски диалект]] | език2-връзка = не | [[шопски диалект]] | bg-x-shoppe |
език3 = гръцки | [[гръцки език]] | el | |
език4 = македонски | [[македонски език]] | mk | |
език5 = [[латински език \vert лат.]] | език5-връзка = не | [[латински език]] | la |
From this we can derive the following requirements, which I've sorted by (my understanding of) decreasing popularity (importance)
македонски -> [[македонски език]]
)I guess the last few are esoteric...
I've read Nono's comment above and it's a great idea to use ISO codes out of language pages. But:
bg-x-shoppe
is a custom tag not found in any standard)[[шопски диалект]] -> bg-x-shoppe
)
Modèle:Infobox_Territoire uses couples of name/language properties to identify local names and their respective languages. Those are direct ISO language codes.
@jplu tried to use the language properties in Mapping_fr:Infobox_Territoire to resolve dbpedia/mappings-tracker/issues/41, but this is obviously causing a mapping error.
A more typical use case may be to have an original_title/original_language couple where original_language contains an internal link to the page about a language (or just it's name?) and we want the original title to be tagged with the appropriate language code while it's currently tagged by default with the wiki language code. For example L'Héritier de l'Empire issues dbfr:L'Héritier_de_l'Empire dbo:originalTitle "Heir of the Empire" @ fr ; dbo:language dbfr:Anglais_américain . While the first should be tagged @ en or @en-US based on the second.