Unable to run segmenter by itself

Hi, I ran the docker and parser succesfully, but I would like to segment the sentence using parse.py, however here is the error message I got:

*** Segmentation failed *** Traceback (most recent call last): File "parse.py", line 136, in parse self.segmenter.segment(doc) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 175, in segment_sentence seq_prob, predictions = self.classifier.classify(features) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/classifiers/crf_classifier.py", line 45, in classify self.classifier.stdin.write('\n'.join(vectors) + "\n\n") IOError: [Errno 32] Broken pipe None Some error occurred, skipping the file Successfully unloaded syntax parser Traceback (most recent call last): File "parse.py", line 317, in main raise e IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "parse.py", line 367, in <module> main(options, args) File "parse.py", line 326, in main raise Exception, traceback.print_exc() Exception

Could you tell me what could possibly be wrong? I am running the parse.py as python parse.py -v -s ../texts/input_short.txt

Dear @serenayj,

I don't remember why this isn't working with Vanessa Feng's original parse.py any longer, but you can still run the segmentation using my wrapper:

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/input_short.txt 
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 61, in main
    results, get_parser_stdout(parser_stdout_filepath))
AssertionError: Expected one parse tree as a result, but got: [None].
Parser STDOUT was:
Added classifier classifier1 to segmenter gCRF
Finished initialization in 1.51 seconds.

Processing 1 documents, skipping 0
Parsing ../texts/input_short.txt, progress: 0.00 (0 out of 1)
Processing sentence 0 out of 1
Finished preprocessing in 0.50 seconds.

Finished segmentation in 0.02 seconds.
Segmented into 2 EDUs.

Output EDU segmentation result to /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus

===================================================
Successfully unloaded syntax parser
Successfully unloaded gCRF

/opt/feng-hirst-rst-parser/src # cat /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus
Although they did n't like it , EDU_BREAK they accepted the offer .

Best, Arne

Hi Arne,

Thanks for your quick response. I'm still getting the same error using your commands "python parser_wrapper.py --skip_parsing ../texts/input_short.txt ", particularly on the subprocess stdin and stdout.

Traceback (most recent call last): File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 136, in parse self.segmenter.segment(doc) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/segmenters/crf_segmenter.py", line 175, in segment_sentence seq_prob, predictions = self.classifier.classify(features) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/classifiers/crf_classifier.py", line 55, in classify self.classifier.stdin.write('\n'.join(vectors) + "\n\n") IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 317, in main raise e IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "parser_wrapper.py", line 73, in main() File "parser_wrapper.py", line 58, in main results = feng_main(options, args) File "/Users/sailormoon/Downloads/feng-hirst-rst-parser-master/src/parse.py", line 326, in main raise Exception, traceback.print_exc() Exception

Is there a way that I could get rid of subprocess popen library? I've tried running it on different machines but with no suceess.

Hi Serena,

does it work when you run it inside Docker?

~/repos/feng-hirst-rst-parser$ docker build -t feng-hirst .
~/repos/feng-hirst-rst-parser$ docker run --entrypoint=/bin/sh -v /tmp:/tmp -ti feng-hirst
/opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/input_short.txt 
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 61, in main
    results, get_parser_stdout(parser_stdout_filepath))
AssertionError: Expected one parse tree as a result, but got: [None].
Parser STDOUT was:
Added classifier classifier1 to segmenter gCRF
Finished initialization in 1.47 seconds.

Processing 1 documents, skipping 0
Parsing ../texts/input_short.txt, progress: 0.00 (0 out of 1)
Processing sentence 0 out of 1
Finished preprocessing in 0.44 seconds.

Finished segmentation in 0.02 seconds.
Segmented into 2 EDUs.

Output EDU segmentation result to /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus

===================================================
Successfully unloaded syntax parser
Successfully unloaded gCRF

/opt/feng-hirst-rst-parser/src # cat /opt/feng-hirst-rst-parser/texts/results/input_short.txt.edus
Although they did n't like it , EDU_BREAK they accepted the offer .

Hi Arne,

I could get it running inside docker, but the script only runs on texts/input_long.txt and texts/input_short.txt. I tried to run it on rst-dt dataset, but it failed as : /opt/feng-hirst-rst-parser/src # python parser_wrapper.py --skip_parsing ../texts/wsj_0607.out Traceback (most recent call last): File "/opt/feng-hirst-rst-parser/src/parse.py", line 135, in parse self.segmenter.segment(doc) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 277, in segment self.segment_sentence(sentence) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 164, in segment_sentence features = self.write_features(sentence, offset2neighbouring_boundaries) File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 97, in write_features inst_features = self.feature_writer.write_features([token0, token1, token2, token3], edu_segmentation) File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 213, in write_features self.write_unit_token_identity_features(token1, 1, i) File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 56, in write_unit_token_identity_features self.features.add('Largest_Subtree_Top_Tag=%s_Unit=%d@%d' % (ancestor_subtree.label(), unit, position)) File "/usr/lib/python2.7/site-packages/nltk/tree.py", line 235, in label return self._label AttributeError: 'LexicalizedTree' object has no attribute '_label' Traceback (most recent call last): File "/opt/feng-hirst-rst-parser/src/parse.py", line 311, in main raise e AttributeError: 'LexicalizedTree' object has no attribute '_label' Traceback (most recent call last): File "parser_wrapper.py", line 72, in main() File "parser_wrapper.py", line 57, in main results = feng_main(options, args) File "/opt/feng-hirst-rst-parser/src/parse.py", line 320, in main raise Exception, traceback.print_exc() Exception

Is there any preprocessing that I'm missing? It also fails at other text files under texts folder, e.g. BGSU1001.txt.

It works fine for me with BGSU1001.txt, what's your output for this file?

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/BGSU1001.txt
ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['It is time ,', 'that our society is dominated by industrialization .']), ParseTree('Elaboration[N][S]', ['The prosperity of a country is based on its enormous industrial corporations', 'that are gradually replacing men with machines .'])]), ParseTree('Joint[N][N]', ['Science is highly developed', 'and controls the economy .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['From the beginning of school life students are expected to master a huge amount of scientific data .', 'Technology is part of our everyday life .']), ParseTree('Contrast[N][N]', ["Children nowadays prefer to play with computers rather than with our parents ' wooden toys .", ParseTree('Attribution[S][N]', ['But I think', ParseTree('Elaboration[N][S]', ['that in our modern world', 'which worships science and technology there is still a place for dreams and imagination .'])])])]), "There has always been a place for them in man 's life ."])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Background[N][S]', ['Even in the darkness of the Middle Ages', ParseTree('Elaboration[N][S]', ['when religion', 'confined the human mind in the cage of its dogmas , men dreamt of a better life .'])]), ParseTree('Elaboration[N][S]', ['They dreamt of exploring the unknown depths of earth and sea ,', ParseTree('Joint[N][N]', ['of flying into the sky', 'and reaching the stars .'])])]), ParseTree('Joint[N][N]', ['Step by step society has freed itself from the restrictions of religion', 'and science has become the dominant power .'])])]), ParseTree('Contrast[S][N]', ["Nowadays some people see in its dominance a threat to man 's inclination to dreaming .", ParseTree('Attribution[S][N]', [ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['But man', 'could dream in the darkest past ,']), 'so why could']), 'he not dream now ?'])])]), ParseTree('Elaboration[N][S]', [ParseTree('Attribution[S][N]', ['Man has always', ParseTree('Elaboration[N][S]', ['dreamt of going beyond the world', 'he knew .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Contrast[N][S]', ['Science has made the unknown world too small', 'but man keeps on dreaming .']), ParseTree('Enablement[N][S]', ['He turns to books', 'to find his new inspiration .'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Evaluation[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Attribution[S][N]', ['People nowadays read not less', 'than they did in the past .']), ParseTree('Elaboration[N][S]', ['Besides they still create literature', 'which means they are still capable of creating new imaginary worlds .'])]), ParseTree('Elaboration[N][S]', [ParseTree('Enablement[N][S]', ['People still use their imagination', 'to paint pictures and to compose music .']), ParseTree('Joint[N][N]', ['Art is still alive .', ParseTree('Elaboration[N][S]', [ParseTree('Explanation[N][S]', ['And society feels necessity for this art', ParseTree('Contrast[N][S]', ['because people enjoy escaping to other worlds ,', 'even if they are not real .'])]), ParseTree('Attribution[S][N]', ['Moreover , I can not imagine', 'that people have lost their ability of creative thinking .'])])])])]), ParseTree('Explanation[N][S]', ['People have always had the gift of imagination .', ParseTree('Topic-Comment[N][S]', [ParseTree('Elaboration[N][S]', ['I can not agree with the statement', 'that because of the dominance of science and technology , there is no place for this gift .']), ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['How would all the technological achievements and scientific breakthroughs', 'been made ,']), 'if man had not used his imagination ?'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['It is because of his imagination and his constant dreaming', 'that man has achieved so much in science .']), ParseTree('same-unit[N][N]', [ParseTree('Elaboration[N][S]', ['The fact', 'that science never stops developing']), ParseTree('Elaboration[N][S]', ['is a proof', 'that man never stops using his imagination .'])])])]), ParseTree('Explanation[N][S]', [ParseTree('Attribution[S][N]', ['I am sure', 'that man keeps on dreaming .']), ParseTree('Joint[N][N]', [ParseTree('Contrast[N][N]', [ParseTree('Elaboration[N][S]', ['There is not much space', 'left to be explored']), ParseTree('Joint[N][N]', ['but in every field of human life there will always
be some kind of a frontier', 'and people will always dream of going beyond it .'])]), ParseTree('Joint[N][N]', [ParseTree('Joint[N][N]', ['Maybe industry has taken the place of nature', 'and maybe romanticism is dead but psychologists go
on asking the question :']), '" Whom or what would you like to take with you on a lonely island ?'])])])])]), ParseTree('Elaboration[N][S]', ['" And people', ParseTree('Attribution[S][N]', ['go on dreaming', 'that they will spend their lives like Robinson Crusoe with their favourite show-business star , or favourite pet or book but not with their personal computer .'])])])

I get the same error as you for wsj_0607.out, but if I rename the file, it works just fine:

/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/wsj_0607.out
Traceback (most recent call last):
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 135, in parse
    self.segmenter.segment(doc)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 277, in segment
    self.segment_sentence(sentence)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 164, in segment_sentence
    features = self.write_features(sentence, offset2neighbouring_boundaries)
  File "/opt/feng-hirst-rst-parser/src/segmenters/crf_segmenter.py", line 97, in write_features
    inst_features = self.feature_writer.write_features([token0, token1, token2, token3], edu_segmentation)
  File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 213, in write_features
    self.write_unit_token_identity_features(token1, 1, i)
  File "/opt/feng-hirst-rst-parser/src/features/segmenter_feature_writer.py", line 56, in write_unit_token_identity_features
    self.features.add('Largest_Subtree_Top_Tag=%s_Unit=%d@%d' % (ancestor_subtree.label(), unit, position))
  File "/usr/lib/python2.7/site-packages/nltk/tree.py", line 235, in label
    return self._label
AttributeError: 'LexicalizedTree' object has no attribute '_label'
Traceback (most recent call last):
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 311, in main
    raise e
AttributeError: 'LexicalizedTree' object has no attribute '_label'
Traceback (most recent call last):
  File "parser_wrapper.py", line 72, in <module>
    main()
  File "parser_wrapper.py", line 57, in main
    results = feng_main(options, args)
  File "/opt/feng-hirst-rst-parser/src/parse.py", line 320, in main
    raise Exception, traceback.print_exc()
Exception

/opt/feng-hirst-rst-parser/src # cp ../texts/wsj_0607.out ../texts/wsj_0607.text
/opt/feng-hirst-rst-parser/src # python parser_wrapper.py ../texts/wsj_0607.text 
ParseTree('Topic-Change[N][N]', [ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', ['Three new issues begin trading on the New York Stock Exchange today ,', 'and one began trading on the Nasdaq/National Market System last week .']), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['On the Big Board , Crawford & Co. , Atlanta ,', '( CFD ) begins trading today .']), ParseTree('Joint[N][N]', ["Crawford evaluates health care plans , manages medical and disability aspects of worker 's compensation injuries", 'and is involved in claims adjustments for insurance companies .'])]), ParseTree('Explanation[N][S]', [ParseTree('Elaboration[N][S]', ['Also beginning trading today on the Big Board are El Paso Refinery Limited Partnership , El Paso , Texas ,', ParseTree('Elaboration[N][S]', ['( ELP ) and Franklin Multi-Income Trust , San Mateo , Calif. ,', '( FMI ) .'])]), ParseTree('Elaboration[N][S]', ['El Paso owns and operates a petroleum refinery .', 'Franklin is a closed-end management investment company .'])])])]), ParseTree('Elaboration[N][S]', [ParseTree('Elaboration[N][S]', ['On the Nasdaq over-the-counter system , Allied Capital Corp. , Washington , D.C. ,', '( ALII ) began trading last Thursday .']), ParseTree('Elaboration[N][S]', ['Allied Capital is a closed-end management investment company', 'that will operate as a business development concern .'])])])

arne-cl / feng-hirst-rst-parser

Unable to run segmenter by itself #6