To support wildcard (*) syntax for single word lexicon files. This would also be useful for rules like all punctuation tokens, which should be labelled as the semantic category PUNCT, for punctuation.
The wildcard symbol in this syntax would mean that zero or more characters may appear after the word token and/or Part Of Speech (POS) tag. This syntax will therefore hold the same meaning between single word and Multi Word Expression files.
Example
Assuming the single word lexicon file:
lemma pos semantic_tags
*kg num N3.5
* punc PUNCT
In the first case it would allow tagging anything that ended with kg, e.g. 15kg to be tagged as a measurement, the N3.5 semantic tag. In the second case it would label all punctuation with the punctuation semantic tag, PUNCT.
To support wildcard (
*
) syntax for single word lexicon files. This would also be useful for rules like all punctuation tokens, which should be labelled as the semantic categoryPUNCT
, for punctuation.The wildcard symbol in this syntax would mean that zero or more characters may appear after the word token and/or Part Of Speech (POS) tag. This syntax will therefore hold the same meaning between single word and Multi Word Expression files.
Example
Assuming the single word lexicon file:
In the first case it would allow tagging anything that ended with
kg
, e.g.15kg
to be tagged as a measurement, theN3.5
semantic tag. In the second case it would label all punctuation with the punctuation semantic tag,PUNCT
.