arne-cl / feng-hirst-rst-parser

fork of Vanessa Wei Feng's RST-style discourse parser
http://www.cs.toronto.edu/~weifeng/software.html
BSD 2-Clause "Simplified" License
13 stars 7 forks source link

Unable to run segmenter by itself #6

Closed serenayj closed 3 years ago

serenayj commented 3 years ago

Hi, I ran the docker and parser succesfully, but I would like to segment the sentence using parse.py, however here is the error message I got:

*** Segmentation failed *** Traceback (most recent call last): File "parse.py", line 136, in parse self.segmenter.segment(doc) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 175, in segment_sentence seq_prob, predictions = self.classifier.classify(features) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/classifiers/crf_classifier.py", line 45, in classify self.classifier.stdin.write('\n'.join(vectors) + "\n\n") IOError: [Errno 32] Broken pipe None Some error occurred, skipping the file Successfully unloaded syntax parser Traceback (most recent call last): File "parse.py", line 317, in main raise e IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "parse.py", line 367, in <module> main(options, args) File "parse.py", line 326, in main raise Exception, traceback.print_exc() Exception

Could you tell me what could possibly be wrong? I am running the parse.py as python parse.py -v -s ../texts/input_short.txt

arne-cl commented 3 years ago

Dear @serenayj,

I don't remember why this isn't working with Vanessa Feng's original parse.py any longer, but you can still run the segmentation using my wrapper:

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/input_short.txt 
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 61, in main
    results, get_parser_stdout(parser_stdout_filepath))
AssertionError: Expected one parse tree as a result, but got: [None].
Parser STDOUT was:
Added classifier classifier1 to segmenter gCRF
Finished initialization in 1.51 seconds.

Processing 1 documents, skipping 0
Parsing ../texts/input_short.txt, progress: 0.00 (0 out of 1)
Processing sentence 0 out of 1
Finished preprocessing in 0.50 seconds.

Finished segmentation in 0.02 seconds.
Segmented into 2 EDUs.

Output EDU segmentation result to /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus

===================================================
Successfully unloaded syntax parser
Successfully unloaded gCRF

/opt/feng-hirst-rst-parser/src # cat /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus
Although they did n't like it , EDU_BREAK they accepted the offer .

Best, Arne

serenayj commented 3 years ago

Hi Arne,

Thanks for your quick response. I'm still getting the same error using your commands "python parser_wrapper.py --skip_parsing ../texts/input_short.txt ", particularly on the subprocess stdin and stdout.

Traceback (most recent call last): File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 136, in parse self.segmenter.segment(doc) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 175, in segment_sentence seq_prob, predictions = self.classifier.classify(features) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/classifiers/crf_classifier.py", line 55, in classify self.classifier.stdin.write('\n'.join(vectors) + "\n\n") IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 317, in main raise e IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "parser_wrapper.py", line 73, in main() File "parser_wrapper.py", line 58, in main results = feng_main(options, args) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 326, in main raise Exception, traceback.print_exc() Exception

Is there a way that I could get rid of subprocess popen library? I've tried running it on different machines but with no suceess.

arne-cl commented 3 years ago

Hi Serena,

does it work when you run it inside Docker?

~/repos/feng-hirst-rst-parser$ docker build -t feng-hirst .
~/repos/feng-hirst-rst-parser$ docker run --entrypoint=/bin/sh -v /tmp:/tmp -ti feng-hirst
/opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/input_short.txt 
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 61, in main
    results, get_parser_stdout(parser_stdout_filepath))
AssertionError: Expected one parse tree as a result, but got: [None].
Parser STDOUT was:
Added classifier classifier1 to segmenter gCRF
Finished initialization in 1.47 seconds.

Processing 1 documents, skipping 0
Parsing ../texts/input_short.txt, progress: 0.00 (0 out of 1)
Processing sentence 0 out of 1
Finished preprocessing in 0.44 seconds.

Finished segmentation in 0.02 seconds.
Segmented into 2 EDUs.

Output EDU segmentation result to /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus

===================================================
Successfully unloaded syntax parser
Successfully unloaded gCRF

/opt/feng-hirst-rst-parser/src # cat /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus
Although they did n't like it , EDU_BREAK they accepted the offer .
serenayj commented 3 years ago

Hi Arne,

I could get it running inside docker, but the script only runs on texts/input_long.txt and texts/input_short.txt. I tried to run it on rst-dt dataset, but it failed as : /opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/wsj_0607.out Traceback (most recent call last): File "/opt/feng-hirst-rst-parser/src/parse.py", line 135, in parse self.segmenter.segment(doc) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 164, in segment_sentence features = self.write_features(sentence, offset2neighbouring_boundaries) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 97, in write_features inst_features = self.feature_writer.write_features([token0, token1, token2, token3], edu_segmentation) File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 213, in write_features self.write_unit_token_identity_features(token1, 1, i) File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 56, in write_unit_token_identity_features self.features.add('Largest_Subtree_Top_Tag=%s_Unit=%d@%d' % (ancestor_subtree.label(), unit, position)) File "/usr/lib/python2.7/site-packages/nltk/tree.py", line 235, in label return self._label AttributeError: 'LexicalizedTree' object has no attribute '_label' Traceback (most recent call last): File "/opt/feng-hirst-rst-parser/src/parse.py", line 311, in main raise e AttributeError: 'LexicalizedTree' object has no attribute '_label' Traceback (most recent call last): File "parser_wrapper.py", line 72, in main() File "parser_wrapper.py", line 57, in main results = feng_main(options, args) File "/opt/feng-hirst-rst-parser/src/parse.py", line 320, in main raise Exception, traceback.print_exc() Exception

Is there any preprocessing that I'm missing? It also fails at other text files under texts folder, e.g. BGSU1001.txt.

arne-cl commented 3 years ago

It works fine for me with BGSU1001.txt, what's your output for this file?

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/BGSU1001.txt
ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['It is time ,', 'that our society is dominated by industrialization .']), ParseTree('Elaboration[N][S]', ['The prosperity of a country is based on its enormous industrial corporations', 'that are gradually replacing men with machines .'])]), ParseTree('Joint[N][N]', ['Science is highly developed', 'and controls the economy .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['From the beginning of school life students are expected to master a huge amount of scientific data .', 'Technology is part of our everyday life .']), ParseTree('Contrast[N][N]', ["Children nowadays prefer to play with computers rather than with our parents ' wooden toys .", ParseTree('Attribution[S][N]', ['But I think', ParseTree('Elaboration[N][S]', ['that in our modern world', 'which worships science and technology there is still a place for dreams and imagination .'])])])]), "There has always been a place for them in man 's life ."])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Background[N][S]', ['Even in the darkness of the Middle Ages', ParseTree('Elaboration[N][S]', ['when religion', 'confined the human mind in the cage of its dogmas , men dreamt of a better life .'])]), ParseTree('Elaboration[N][S]', ['They dreamt of exploring the unknown depths of earth and sea ,', ParseTree('Joint[N][N]', ['of flying into the sky', 'and reaching the stars .'])])]), ParseTree('Joint[N][N]', ['Step by step society has freed itself from the restrictions of religion', 'and science has become the dominant power .'])])]), ParseTree('Contrast[S][N]', ["Nowadays some people see in its dominance a threat to man 's inclination to dreaming .", ParseTree('Attribution[S][N]', [ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['But man', 'could dream in the darkest past ,']), 'so why could']), 'he not dream now ?'])])]), ParseTree('Elaboration[N][S]', [ParseTree('Attribution[S][N]', ['Man has always', ParseTree('Elaboration[N][S]', ['dreamt of going beyond the world', 'he knew .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Contrast[N][S]', ['Science has made the unknown world too small', 'but man keeps on dreaming .']), ParseTree('Enablement[N][S]', ['He turns to books', 'to find his new inspiration .'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Evaluation[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Attribution[S][N]', ['People nowadays read not less', 'than they did in the past .']), ParseTree('Elaboration[N][S]', ['Besides they still create literature', 'which means they are still capable of creating new imaginary worlds .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Enablement[N][S]', ['People still use their imagination', 'to paint pictures and to compose music .']), ParseTree('Joint[N][N]', ['Art is still alive .', ParseTree('Elaboration[N][S]', [ParseTree('Explanation[N][S]', ['And society feels necessity for this art', ParseTree('Contrast[N][S]', ['because people enjoy escaping to other worlds ,', 'even if they are not real .'])]), ParseTree('Attribution[S][N]', ['Moreover , I can not imagine', 'that people have lost their ability of creative thinking .'])])])])]), ParseTree('Explanation[N][S]', ['People have always had the gift of imagination .', ParseTree('Topic-Comment[N][S]', [ParseTree('Elaboration[N][S]', ['I can not agree with the statement', 'that because of the dominance of science and technology , there is no place for this gift .']), ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['How would all the technological achievements and scientific breakthroughs', 'been made ,']), 'if man had not used his imagination ?'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['It is because of his imagination and his constant dreaming', 'that man has achieved so much in science .']), ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['The fact', 'that science never stops developing']), ParseTree('Elaboration[N][S]', ['is a proof', 'that man never stops using his imagination .'])])])]), ParseTree('Explanation[N][S]', [ParseTree('Attribution[S][N]', ['I am sure', 'that man keeps on dreaming .']), ParseTree('Joint[N][N]', [ParseTree('Contrast[N][N]', [ParseTree('Elaboration[N][S]', ['There is not much space', 'left to be explored']), ParseTree('Joint[N][N]', ['but in every field of human life there will always
be some kind of a frontier', 'and people will always dream of going beyond it .'])]), ParseTree('Joint[N][N]', [ParseTree('Joint[N][N]', ['Maybe industry has taken the place of nature', 'and maybe romanticism is dead but psychologists go
on asking the question :']), '" Whom or what would you like to take with you on a lonely island ?'])])])])]), ParseTree('Elaboration[N][S]', ['" And people', ParseTree('Attribution[S][N]', ['go on dreaming', 'that they will spend their lives like Robinson Crusoe with their favourite show-business star , or favourite pet or book but not with their personal computer .'])])])

I get the same error as you for wsj_0607.out, but if I rename the file, it works just fine:

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/wsj_0607.out
Traceback (most recent call last):
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 135, in parse
    self.segmenter.segment(doc)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 277, in segment
    self.segment_sentence(sentence)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 164, in segment_sentence
    features = self.write_features(sentence, offset2neighbouring_boundaries)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 97, in write_features
    inst_features = self.feature_writer.write_features([token0, token1, token2, token3], edu_segmentation)
  File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 213, in write_features
    self.write_unit_token_identity_features(token1, 1, i)
  File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 56, in write_unit_token_identity_features
    self.features.add('Largest_Subtree_Top_Tag=%s_Unit=%d@%d' % (ancestor_subtree.label(), unit, position))
  File "/usr/lib/python2.7/site-packages/nltk/tree.py", line 235, in label
    return self._label
AttributeError: 'LexicalizedTree' object has no attribute '_label'
Traceback (most recent call last):
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 311, in main
    raise e
AttributeError: 'LexicalizedTree' object has no attribute '_label'
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 57, in main
    results = feng_main(options, args)
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 320, in main
    raise Exception, traceback.print_exc()
Exception

/opt/feng-hirst-rst-parser/src # cp ../texts/wsj_0607.out ../texts/wsj_0607.text
/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/wsj_0607.text 
ParseTree('Topic-Change[N][N]', [ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', ['Three new issues begin trading on the New York Stock Exchange today ,', 'and one began trading on the Nasdaq/National Market System last week .']), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['On the Big Board , Crawford & Co. , Atlanta ,', '( CFD ) begins trading today .']), ParseTree('Joint[N][N]', ["Crawford evaluates health care plans , manages medical and disability aspects of worker 's compensation injuries", 'and is involved in claims adjustments for insurance companies .'])]), ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', ['Also beginning trading today on the Big Board are El Paso Refinery Limited Partnership , El Paso , Texas ,', ParseTree('Elaboration[N][S]', ['( ELP ) and Franklin Multi-Income Trust , San Mateo , Calif. ,', '( FMI ) .'])]), ParseTree('Elaboration[N][S]', ['El Paso owns and operates a petroleum refinery .', 'Franklin is a closed-end management investment company .'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['On the Nasdaq over-the-counter system , Allied Capital Corp. , Washington , D.C. ,', '( ALII ) began trading last Thursday .']), ParseTree('Elaboration[N][S]', ['Allied Capital is a closed-end management investment company', 'that will operate as a business development concern .'])])])