danielhers / tupa

Transition-based UCCA Parser
https://danielhers.github.io/tupa
GNU General Public License v3.0
72 stars 24 forks source link

How can I train the parser using other corpus? #71

Closed CarolLi closed 5 years ago

CarolLi commented 5 years ago

I think the pre-trained model is too large for my task, therefore, I want to train the model using another corpus. Is there any format requirements for the training data?

danielhers commented 5 years ago

Do you want to train it for UCCA parsing, or for parsing text to another representation? If UCCA parsing, you can use any of the UCCA-annotated corpora: https://github.com/UniversalConceptualCognitiveAnnotation It's best to use the sentence-split files, which are under the master-sentences-xml branch in each of these repositories:

CarolLi commented 5 years ago

Do you want to train it for UCCA parsing, or for parsing text to another representation? If UCCA parsing, you can use any of the UCCA-annotated corpora: https://github.com/UniversalConceptualCognitiveAnnotation It's best to use the sentence-split files, which are under the master-sentences-xml branch in each of these repositories:

Got it! Thank you~