Closed daxenberger closed 8 years ago
Wouldn't that be something like a DictionaryFeatureExtractor?
Reported by richard.eckart
on 2013-11-10 13:26:10
There's the de.tudarmstadt.ukp.dkpro.tc.features.content.TopicWordsFeatureExtractor
which is basically a simple dictionary feature extractor (and also should be renamed,
accordingly, btw)
It adds a hard coded prefix to each feature - I suggest we make this prefix configurable,
so you could set it to "Modal_" and can then identify Modal features later on easily.
The mechanics of the extractor are pretty much the same for these cases.
Reported by oliver.ferschke
on 2013-11-10 13:30:31
It also depends on what aspects of modal verbs you want to capture:
count in document?
text to modal verb ratio?
simple presence of any modal verbs in the text?
other?
These scenarios could also be implemented in a generic way for all use cases based
on dictionaries...
Reported by oliver.ferschke
on 2013-11-10 13:34:49
yes, I think it would be something *similar* like that:
the generalized version - a DictionaryFeatureExtractor - would count occurrences of
items from a Dictionary (e.g. wordlist) AND it would aggregate these counts:
so in the modal verbs example, not only the individual modal verbs (e.g. must, should)
are counted, but also "modals"
so it's more like a "DictionaryWordClassFeatureExtractor"
Reported by eckle.kohler
on 2013-11-10 13:35:24
Another option would be to rely on POS tagging instead of a dictionary.
STTS has specific categories for modals.
I see two advantages: (i) you don't need to add all forms, and (ii) you don't wrongly
count surface forms that are not used as a model in a certain context.
Reported by torsten.zesch
on 2013-11-10 17:19:57
>>Another option would be to rely on POS tagging instead of a dictionary.
>>STTS has specific categories for modals.
I agree that this would in theory be preferable over word forms - however, only if
the POS tagger is able to tag modal verbs accurately. This would have to be looked
into. From my past experience with the STTS tagset / TreeTagger, I recall that some
of these smaller word classes are tagged wrongly and therefore counting the lexical
items was less noisy.
Reported by eckle.kohler
on 2013-11-11 06:35:35
@Judith: do you still have plans to solve this issue?
Reported by daxenberger.j
on 2014-06-04 11:55:05
yes, but not now - I need to first get an overview of the current state of TC which
I will do after the upcoming release
can you move it to milestone after the upcoming release, please
Reported by eckle.kohler
on 2014-06-04 12:08:15
Reported by daxenberger.j
on 2014-06-04 12:32:36
Reported by daxenberger.j
on 2014-08-29 10:50:13
In order to determine word difficulty, I added some functions to determine adjective
endings, help verbs, modal verbs and auxiliary words for English, German and French
to de.tudarmstadt.ukp.dkpro.tc.features.readability.util. I then noticed the AdjectiveEndingFeatureExtractor
and the ModalVerbsFeatureExtractor for English and this discussion.
I also added WordListExtractors that check if a word occurs in a list.
In both cases, I am not yet very happy with the solution, but maybe they can revive
this discussion to generalize FEs to other languages.
Reported by lisa.beinborn
on 2015-03-12 11:29:53
This issue has been reported in 2013 and no one seems to care about it anymore - I close this one.
Originally reported on Google Code with ID 58
Reported by
eckle.kohler
on 2013-11-10 13:17:35