EducationalTestingService / rstfinder

Fast Discourse Parser to find latent Rhetorical STructure (RST) in text.
MIT License
121 stars 24 forks source link

tune_segmentation_model fails because of missing segmentation_model file #42

Closed arne-cl closed 9 years ago

arne-cl commented 9 years ago

After running extract_segmentation_features, I couldn't get tune_segmentation_model to work.

/opt/discourse-parsing# tune_segmentation_model rst_discourse_tb_edus_features_TRAINING_TRAIN.tsv rst_discourse_tb_edus_features_TRAINING_DEV.tsv segmentation_model
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.

reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)] 
0.00 s

feature_index.cpp(193) [mmap_.open(model_filename)] mmap.h(153) [(fd = ::open(filename, flag | O_BINARY)) >= 0] open failed: segmentation_model.C0.015625
Traceback (most recent call last):
  File "/usr/local/bin/tune_segmentation_model", line 9, in <module>
    load_entry_point('discourseparsing==0.2.1', 'console_scripts', 'tune_segmentation_model')()
  File "/usr/local/lib/python3.3/dist-packages/discourseparsing-0.2.1-py3.3.egg/discourseparsing/tune_segmentation_model.py", line 92, in main
  File "/usr/lib/python3.3/subprocess.py", line 589, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['crf_test', '-m', 'segmentation_model.C0.015625', 'rst_discourse_tb_edus_features_TRAINING_DEV.tsv']' returned non-zero exit status 255

The problem seems to be that crf_learn doesn't create any segmentation_model files in the first place (but doesn't give any error message either):

/opt/discourse-parsing# crf_learn -v
CRF++ of 0.58

/opt/discourse-parsing# crf_learn segmentation_crfpp_template.txt rst_discourse_tb_edus_features_TRAINING_TRAIN.tsv segmentation_model.C0.015625 -c 0.015625
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.

reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)] 
0.00 s
aoifecahill commented 9 years ago

I think there is an incorrect default value in the make_segmentation_crfpp_template.py script (it's fixed in the pending PR).

Can you try changing the default number to 12 instead of 13?

arne-cl commented 9 years ago

Thank you Aoife. tune_segmentation_model works now, after I applied your change like this:

git cherry-pick 3c9aad35446a1d7b6a49b977cff95b10458e8cdc
python3 setup.py install
ankur220693 commented 6 years ago

help me i am getting the same error

ankur220693 commented 6 years ago

where is tune_segmentation_model file and how to change it 2

arne-cl commented 6 years ago

Dear @ankur220693,

the problem should be solved in my fork: https://github.com/arne-cl/discourse-parsing I also provide a docker container for this: https://github.com/NLPbox/heilman-sagae-2015-docker

If you have docker installed on your machine (and you trust random people on the internet), then you can simply install the parser from Docker Hub with one line:

$ cat /tmp/input.txt
Although they didn't like it, they accepted the offer.

$ docker run -v /tmp:/tmp -ti nlpbox/heilman-sagae-2015:2018-05-12-1 /tmp/input.txt
Loading tagger from /opt/zpar-0.7/models/english/tagger
Loading model... done.
Loading constituency parser from /opt/zpar-0.7/models/english/conparser
Loading scores... done. (21.7049s)
{"scored_rst_trees": [{"score": -0.9662282971887425, "tree": "(ROOT (satellite:contrast (text 0)) (nucleus:span (text 1)))"}], "edu_tokens": [["Although", "they", "did", "n't", "like", "it", ","], ["they", "accepted", "the", "offer", "."]]}

(On the first run, this will take a long time and download half of the internet, but on subsequent run it works like a normal local installation.)

Best regards, Arne

ankur220693 commented 6 years ago

Thanks for your reply, I will try this approach. Test_crf shows error. While, I have train_crf. I will approach you if I will get stuck further. Basically I am running a POS Tagger for Indian languages. I have 20K tagged test and train data.

On Thu 24 May, 2018 3:04 pm Arne Neumann, notifications@github.com wrote:

Dear @ankur220693 https://github.com/ankur220693,

the problem should be solved in my fork: https://github.com/arne-cl/discourse-parsing I also provide a docker container for this: https://github.com/NLPbox/heilman-sagae-2015-docker

If you have docker installed on your machine (and you trust random people on the internet), then you can simply install the parser from Docker Hub with one line:

$ cat /tmp/input.txt Although they didn't like it, they accepted the offer.

$ docker run -v /tmp:/tmp -ti nlpbox/heilman-sagae-2015:2018-05-12-1 /tmp/input.txt Loading tagger from /opt/zpar-0.7/models/english/tagger Loading model... done. Loading constituency parser from /opt/zpar-0.7/models/english/conparser Loading scores... done. (21.7049s) {"scored_rst_trees": [{"score": -0.9662282971887425, "tree": "(ROOT (satellite:contrast (text 0)) (nucleus:span (text 1)))"}], "edu_tokens": [["Although", "they", "did", "n't", "like", "it", ","], ["they", "accepted", "the", "offer", "."]]}

(On the first run, this will take a long time and download half of the internet, but on subsequent run it works like a normal local installation.)

Best regards, Arne

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EducationalTestingService/discourse-parsing/issues/42#issuecomment-391651873, or mute the thread https://github.com/notifications/unsubscribe-auth/Alp9cJzp85kRBG_KqmPsW9CEDibAEfckks5t1n6jgaJpZM4FWz-A .

ankur220693 commented 6 years ago

The system stucks when " Loading Scores... " Is there any problem in Indian Languages? My input text is in another language

On Thu, May 24, 2018 at 3:08 PM, Ankur priyadarshi hellrider1422@gmail.com wrote:

Thanks for your reply, I will try this approach. Test_crf shows error. While, I have train_crf. I will approach you if I will get stuck further. Basically I am running a POS Tagger for Indian languages. I have 20K tagged test and train data.

On Thu 24 May, 2018 3:04 pm Arne Neumann, notifications@github.com wrote:

Dear @ankur220693 https://github.com/ankur220693,

the problem should be solved in my fork: https://github.com/arne-cl/ discourse-parsing I also provide a docker container for this: https://github.com/NLPbox/ heilman-sagae-2015-docker

If you have docker installed on your machine (and you trust random people on the internet), then you can simply install the parser from Docker Hub with one line:

$ cat /tmp/input.txt Although they didn't like it, they accepted the offer.

$ docker run -v /tmp:/tmp -ti nlpbox/heilman-sagae-2015:2018-05-12-1 /tmp/input.txt Loading tagger from /opt/zpar-0.7/models/english/tagger Loading model... done. Loading constituency parser from /opt/zpar-0.7/models/english/conparser Loading scores... done. (21.7049s) {"scored_rst_trees": [{"score": -0.9662282971887425, "tree": "(ROOT (satellite:contrast (text 0)) (nucleus:span (text 1)))"}], "edu_tokens": [["Although", "they", "did", "n't", "like", "it", ","], ["they", "accepted", "the", "offer", "."]]}

(On the first run, this will take a long time and download half of the internet, but on subsequent run it works like a normal local installation.)

Best regards, Arne

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EducationalTestingService/discourse-parsing/issues/42#issuecomment-391651873, or mute the thread https://github.com/notifications/unsubscribe-auth/Alp9cJzp85kRBG_KqmPsW9CEDibAEfckks5t1n6jgaJpZM4FWz-A .

ankur220693 commented 6 years ago

capture

ankur220693 commented 6 years ago

crf_test_failure crf_test failure!!

ankur220693 commented 6 years ago

SOLVED ...THANKS A LOT

On Wed, Jun 6, 2018 at 1:02 PM, Ankur priyadarshi hellrider1422@gmail.com wrote:

The system stucks when " Loading Scores... " Is there any problem in Indian Languages? My input text is in another language

On Thu, May 24, 2018 at 3:08 PM, Ankur priyadarshi < hellrider1422@gmail.com> wrote:

Thanks for your reply, I will try this approach. Test_crf shows error. While, I have train_crf. I will approach you if I will get stuck further. Basically I am running a POS Tagger for Indian languages. I have 20K tagged test and train data.

On Thu 24 May, 2018 3:04 pm Arne Neumann, notifications@github.com wrote:

Dear @ankur220693 https://github.com/ankur220693,

the problem should be solved in my fork: https://github.com/arne-cl/dis course-parsing I also provide a docker container for this: https://github.com/NLPbox/heilman-sagae-2015-docker

If you have docker installed on your machine (and you trust random people on the internet), then you can simply install the parser from Docker Hub with one line:

$ cat /tmp/input.txt Although they didn't like it, they accepted the offer.

$ docker run -v /tmp:/tmp -ti nlpbox/heilman-sagae-2015:2018-05-12-1 /tmp/input.txt Loading tagger from /opt/zpar-0.7/models/english/tagger Loading model... done. Loading constituency parser from /opt/zpar-0.7/models/english/conparser Loading scores... done. (21.7049s) {"scored_rst_trees": [{"score": -0.9662282971887425, "tree": "(ROOT (satellite:contrast (text 0)) (nucleus:span (text 1)))"}], "edu_tokens": [["Although", "they", "did", "n't", "like", "it", ","], ["they", "accepted", "the", "offer", "."]]}

(On the first run, this will take a long time and download half of the internet, but on subsequent run it works like a normal local installation.)

Best regards, Arne

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EducationalTestingService/discourse-parsing/issues/42#issuecomment-391651873, or mute the thread https://github.com/notifications/unsubscribe-auth/Alp9cJzp85kRBG_KqmPsW9CEDibAEfckks5t1n6jgaJpZM4FWz-A .