Open mshahriarinia opened 11 years ago
This chunking does need to be done. @SunPHM Is this on your radar?
I think Morteza refers to the noun phrase chunking. What kind of problem does the noun phrase chunking deal with?
If we only need noun phrases, it should be very simple to use Stanford POS tagging. For more complex chunking, we can use the OpenNLP chunking, http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.chunker
Lets have a simple chunking format for now. We need the triple results to look more sensible. Also check out http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dongqing-chunking.pdf
@SunPHM I just checkout the link you posted, first use the method that would be faster to implement.
It seems that LingPipe has phrase chunker too http://alias-i.com/lingpipe/demos/tutorial/posTags/read-me.html NLTK is a python library, while OpenNLP and LingPipe both are java libraries.
In triple generation we get e.g. @, the, a, +, of, ...
and lots of non-noun phrase or noun chunks
Looks like Stanford NLP does not provide chunking: here. That is why we got non-chunked noun-ohrases as triples in the pipeline output. The only java library was the "Mark Greenwood's Noun Phrase Chunker" downloadable from here but it doesn't seem to be maintained. I have tested NLTK and it does chunking.
I checked several resources regarding this like here, etc but none of them refer to Stanford NLP chunking feature.
Any thoughts? @cegme @SunPHM