BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

Can i ask how to train this parser? #27

Closed Yeom closed 9 years ago

Yeom commented 9 years ago

First, i'm sorry that my english level(?) is so low I'm a student in South Korea And i wonder about how can i train the new data? Can i train the new corpus? i can't find how to train the data. Should i do in first-stage TRAIN directory? i'm so curious.

dmcc commented 9 years ago

Thanks for the question. Yes, training should be run from the first-stage/TRAIN directory with the trainParser script. There's some information about what the script expects as arguments here:

https://github.com/BLLIP/bllip-parser/blob/master/first-stage/TRAIN/README.rst

For training data, it depends on what type of text you'd like to parse. Ideally, the training data would be similar in style to the text you'd like to parse. There are various treebanks available -- some are available for free, others can be licensed from the LDC.