NLP-Discourse-SoochowU / sota_end2end_parser

According to my recent four years' research in DRS parsing, I will publish a pre-trained end-to-end DRS parser here for other researchers to directly apply it to specific NLP downstream tasks. Researchers using this project please cite our paper at xxx (coming soon).
Apache License 2.0
5 stars 3 forks source link

Introduction

In this project, we provide an end2end DRS parser where the EDU segmenter in "Zhang et al., Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation" and the DRS parser in "Zhang et al., Adversarial Learning for Discourse Rhetorical Structure Parsing" are applied.

Any questions, just send e-mails to zzlynx@outlook.com (Longyin Zhang).

Usage

  1. Before running this end2end parser, you need to download the following packages in your software environment:
  1. Due to the file size limitation, we put the pre-trained model at https://pan.baidu.com/s/1u0FLgPydISKR-MVcs2SFdg, and the password is lynx. After obtaining the pre-trained models, you need to put the two files at the following two places. It should be noted that since many real-life data usually do not contain standard paragraph boundaries, we only provide the pre-trained model with sentence boundaries considered, which is closer to practical application.

    (parser) data/models_saved/model.pth
    (LM) data/models_saved/xl_model.pth
  2. Prepare your own data with reference to the examples in "data/e2e", where "raw.txt" refers to the article with several sentences, "edu.txt" refers to the texts after EDU segmentation, and "trees.pkl" refers to the generated DRS trees. When everything is ready, run the following command: (Highlight: you only need to prepare the raw.txt, others will be formed after you run pipeline.py)

    python pipeline.py

-- License

   Copyright (c) 2019, Soochow University NLP research group. All rights reserved.
   Redistribution and use in source and binary forms, with or without modification,
   are permitted provided that the following conditions are met:
   1. Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.
   2. Redistributions in binary form must reproduce the above copyright notice, this
      list of conditions and the following disclaimer in the documentation and/or other
      materials provided with the distribution.