BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

About Training Data #28

Closed Yeom closed 9 years ago

Yeom commented 9 years ago

Can i ask more? 1) I want to save existing data and add my new training data ,is that possible? For example I want to get train data that original data and my new data. Can you understand this? Hmm....If i train the data then the original data is blow up? or remain?

2) I saw the DATA directory in first-stage and data files are numbers.... Should i run the some program to get the data like that? Can you tell me how can i get training data form? Thank you for reading!

dmcc commented 9 years ago

1) One possible point of confusion here is that the DATA directories are actually parsing models (the numbers are counts, probabilities, etc.) not the actual training data (treebanks). For the included DATA directories, the actual training data is the Penn Treebank (EN and LM) and Chinese Treebank (CH). If you have these treebanks, you can add other treebanks to them and then train a combined model.

2) The training script (trainParser) helps construct the parsing model directories (converts the real training trees to the various files inside the model directory). See the READMEs in the first-stage and TRAIN directories for more information. See my answer in #27 for where you can download or license some treebanks.

Hope this helps -- please let me know if I can clarify anything.