BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

Problem with conjoined NP's #10

Closed syeedibnfaiz closed 12 years ago

syeedibnfaiz commented 12 years ago

I have found an issue with your biomedical model. It seems that the biomedical model has a tendency to unnecessarily deepening NP's and ADJP's. Consider the following sentence:

Xa and Yb proteins were found .

Using the biomedical model I get a syntax tree with an extra layer of NP's in the subtree: (NP (NP (NN Xa)) (CC and) (NP (NN Yb) (NNS proteins))

As a result of this, when I feed this parse to Stanford dependency parser, I do not get the correct dependency graph (missing 'nn' dependency relation between Xa and proteins). However, if I remove the extra layer of NP's then the dependency graph becomes correct.

The WSJ model does not have this issue. It comes up with the correct parse tree.

Please let me know if I am mistaken or am doing something wrong that is causing the problem.

dmcc commented 12 years ago

I don't think that you're doing anything wrong. The parsers have different models, each with their own quirks since they're trained off different data. I'd recommend a postprocessing step to remove the unnecessary NPs (if you haven't done this already).

Hope this helps, David