When I run command sh run.sh --para_extraction, the following error occur:
Start paragraph extraction, this may take a few hours
Source dir: ../data/preprocessed
Target dir: ../data/extracted
Processing trainset
Processing devset
Processing testset
Traceback (most recent call last):
File "paragraph_extraction.py", line 197, in <module>
paragraph_selection(sample, mode)
File "paragraph_extraction.py", line 111, in paragraph_selection
status = dup_remove(doc)
File "paragraph_extraction.py", line 66, in dup_remove
if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'
Traceback (most recent call last):
File "paragraph_extraction.py", line 197, in <module>
paragraph_selection(sample, mode)
File "paragraph_extraction.py", line 111, in paragraph_selection
status = dup_remove(doc)
File "paragraph_extraction.py", line 66, in dup_remove
if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'
Could you please describe the environment that you use? The command "sh run.sh --para_extraction" has been tested with python2.7.13, and we did not see the error message.
When I run command
sh run.sh --para_extraction
, the following error occur:So, I New a pull request #45 to modify this bug.