Open apmoore1 opened 2 years ago
I've updated the comment to explain things further. It'd be good to find some evaluation of how accurate the auxiliary verb detection is in spaCy. We described our original approach in this UCREL technical paper: https://ucrel.lancs.ac.uk/papers/techpaper/vol3.pdf
To incorporate auxiliary verb rules into the USAS Rule Based Tagger.
Definition of auxiliary verb rules
All POS tags used here are from the CLAWS C7 tagset.
In English (at least in the C version of the semantic tagger) we use auxiliary verb rules for POS tags
VB*
(be),VD*
(do),VH*
(have), to determine the main and auxiliary verbs and therefore alter the semantic tag.An auxiliary verb would normally be given the USAS semantic tag
Z5
grammatical bin, whereas the main verb would be given a nonZ5
tag. For example in the sentence (format istoken_USAS semantic tag
) below the auxiliary verb ishave
and the main verb isfinished
:We have approximately 35 rules in place for amending the semantic tags on
be
,do
, andhave
after the initial set of potential semantic tags are applied. An example rule forhave
is as follows:If the sequence of POS tags matches a given context,
VH*
(POS tag forhave
) followed byV*N
(POS tag for the wordfinished
) with optional intervening adverbs (R*
POS tags) or negation (XX
POS tag), then the rule instructs the tagger to change the semantic tag on the auxiliary verbhave
to beZ5
.For semantic taggers in other languages (the Java versions), we do not have auxiliary/main verb rules in place.
How this rule maps to spaCy pipeline through UPOS tagset
In the UPOS tagset and therefore spaCy POS models we can use the
AUX
POS tag from the UPOS tagset, instead ofVB*
(be),VD*
(do),VH*
(have). Below is the code and output of running the small English spaCy model on the sentenceI have finished my lunch.
:Output: