codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Multi-word named entities as a single token #85

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
There should be some way to turn multi-word named entities into a single token. 
I think there are two options:

1) change any NER component so that it can optionally merge all tokens covered 
by a named entity into a single token

2) implement a new component that takes a type (e.g. NamedEntity) and merges 
all tokens covered by that type into a single token.

I think 2) would be the better solution.

Original issue reported on code.google.com by richard.eckart on 1 Jul 2012 at 6:16

GoogleCodeExporter commented 9 years ago
I like option 2)

Original comment by torsten....@gmail.com on 1 Jul 2012 at 6:59

GoogleCodeExporter commented 9 years ago
Started with 2)... what I thought would be a most simple component turned out 
to be quite a bit more complicated - I'm also adding handling of POS tags and 
lemmata.

Original comment by richard.eckart on 3 Jul 2012 at 7:51

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 13 Oct 2012 at 6:31

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 13 Oct 2012 at 6:33

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 16 Feb 2013 at 11:02

GoogleCodeExporter commented 9 years ago
I think this can be called done for the moment. A basic implementation with 
some tests is there. Better open new issues for any problems/extensions of this 
component.

Original comment by richard.eckart on 10 Apr 2013 at 4:19