3000+ machine-readable open source dictionaries distributed by the Applied Computational Linguistics lab at the University of Augsburg, Germany, and by the research group Linked Open Dictionaries (LiODi, funded 2015-2020 by BMBF at Goethe University Frankfurt, Germany). All data provided in OntoLex-Lemon and TIAD-TSV.
Apache License 2.0
10
stars
2
forks
source link
Apertium RDF - Tags embedded in complex <par> tags in source data lost during extraction #13
, which in the intermediate RDF leads to the entry tres being described with lexinfo:morphosyntacticProperty apertium:mil, but mil seems to be a defined shorthand for a bundle of information in this lexicon (e.g. including in that info the Apertium tag num), and not a tag belonging to the set of Apertium tags that we are mapping to LexInfo (in contrast to num, which is a lexinfo:numeral).
I have checked more dictionaries, and this happens often. Some examples here to have a better idea:
Since the "embedded" tags are not accessed in the extraction, in the final RDF we are mantaining the complex/shorthand ones (e.g. apertium: 4-3pl__adj, apertium:miletc. without a mapping).
This goes back to an issue reported for the mapping (https://github.com/sid-unizar/apertium-lexinfo-mapping/issues/2), but it turns out to affect other tags as well. Discussed in message exchange with Max Ionov.
In the source data we have chunks such as
And later in that same lexicon:
, which in the intermediate RDF leads to the entry tres being described with
lexinfo:morphosyntacticProperty apertium:mil
, butmil
seems to be a defined shorthand for a bundle of information in this lexicon (e.g. including in that info the Apertium tagnum
), and not a tag belonging to the set of Apertium tags that we are mapping to LexInfo (in contrast tonum
, which is alexinfo:numeral
).I have checked more dictionaries, and this happens often. Some examples here to have a better idea:
Since the "embedded" tags are not accessed in the extraction, in the final RDF we are mantaining the complex/shorthand ones (e.g.
apertium: 4-3pl__adj
,apertium:mil
etc. without a mapping).