Open GoogleCodeExporter opened 9 years ago
>> I think as a user I would expect that the parser would not add another set
of POS tags
I agree - without being fully familiar with the original (non-wrapped) parsers
I would expect exactly that
>> However, mind that not all parsers support using pre-exising POS tags, e.g.
BerkeleyParser afaik doesn't support that
what does "support using pre-exising POS tags" mean?
I guess it means, whether or not a parser uses pre-exising POS tags at all.
I think the most important question for a user is, if a parser _requires_
pre-exising POS tags or not.
A user with a strong linguistic background might also be interested to know, if
a parser _is able_ to produce POS tags. Then the user can choose which
component to use for the annotation of POS tags: a POS tagger or the parser
(might depend on the task) - how can a user be informed of that capability?
>> BerkeleyParser: no longer produce POS tags by default
I do not get this - you wrote the BerkeleyParser doesn't support pre-exising
POS tags?
Original comment by eckle.kohler
on 3 Jul 2014 at 7:39
just thought about what I wrote and am not sure if it hits the point / really
makes sense:
in principle, the parsers that produce POS tags behave not as expected in a
pipeline world where the annotations added by components are nicely assigned to
different levels - and that's what is currently done:
the list of components available in Core is aligned with the different analysis
levels
so it would be important to know for a user, if a particular component spans
several analysis levels (as e.g. the Stanford parser does)
Original comment by eckle.kohler
on 3 Jul 2014 at 7:51
>> However, mind that not all parsers support using pre-exising POS tags, e.g.
BerkeleyParser afaik doesn't support that
>what does "support using pre-exising POS tags" mean?
>I guess it means, whether or not a parser uses pre-exising POS tags at all.
It means whether a parser *can* use pre-existing POS tags. E.g. the Stanford
parser can be configured to operate on pre-existing POS tags and to built its
parse trees on them. It can also be configured to ignore pre-existing POS tags
and the generate them as part of the parsing process.
>> BerkeleyParser: no longer produce POS tags by default
> I do not get this - you wrote the BerkeleyParser doesn't support pre-exising
POS tags?
BerkeleyParser, however, does afaik not allow to use pre-existing POS tags and
will always generate them as part of the parsing process. But we can configure
it not to write these POS tags to the CAS and instead leave pre-existing POS
tags from a POS-taggeer in there. This might result in situations where the
constituency tree and the POS tags are not properly in sync, e.g. a noun-phrase
might consist of a single verb token because the POS tagger assigned the tag
"verb" to the token while the parser thought it was a "noun" and built its
constituency structure accordingly.
>I think the most important question for a user is, if a parser _requires_
pre-exising POS tags or not.
As far as I see, most dependency parsers require pre-exsisting POS tags,
whereas constituency parsers usually do not.
>A user with a strong linguistic background might also be interested to know,
if a parser _is able_ to produce POS tags. Then the user can choose which
component to use for the annotation of POS tags: a POS tagger or the parser
(might depend on the task) - how can a user be informed of that capability?
The user might notice that there is a parameter PARAM_WRITE_POS on the
component which can be set to "true".
Original comment by richard.eckart
on 3 Jul 2014 at 7:57
It should be sensible that if a component has the options PARAM_READ_POS and
PARAM_WRITE_POS, then PARAM_WRITE_POS should be automatically disabled when
PARAM_READ_POS is enabled. E.g. a parser that consumes POS tags from a
POS-tagger should not add them a second time just because they happen to be
integrated into the parse trees generated by the parser.
Original comment by richard.eckart
on 17 Aug 2014 at 4:05
Issue 444 has been merged into this issue.
Original comment by richard.eckart
on 22 Jan 2015 at 10:55
Original issue reported on code.google.com by
richard.eckart
on 3 Jul 2014 at 6:44