Closed Yeom closed 9 years ago
1) One possible point of confusion here is that the DATA
directories are actually parsing models (the numbers are counts, probabilities, etc.) not the actual training data (treebanks). For the included DATA
directories, the actual training data is the Penn Treebank (EN
and LM
) and Chinese Treebank (CH
). If you have these treebanks, you can add other treebanks to them and then train a combined model.
2) The training script (trainParser
) helps construct the parsing model directories (converts the real training trees to the various files inside the model directory). See the READMEs in the first-stage
and TRAIN
directories for more information. See my answer in #27 for where you can download or license some treebanks.
Hope this helps -- please let me know if I can clarify anything.
Can i ask more? 1) I want to save existing data and add my new training data ,is that possible? For example I want to get train data that original data and my new data. Can you understand this? Hmm....If i train the data then the original data is blow up? or remain?
2) I saw the DATA directory in first-stage and data files are numbers.... Should i run the some program to get the data like that? Can you tell me how can i get training data form? Thank you for reading!