baidu / DuReader

Baseline Systems of DuReader Dataset
http://ai.baidu.com/broad/subordinate?dataset=dureader
1.13k stars 308 forks source link

modify NoneType error when run `sh run.sh --para_extraction` #46

Closed yongbowin closed 4 years ago

yongbowin commented 5 years ago

When I run command sh run.sh --para_extraction, the following error occur:

Start paragraph extraction, this may take a few hours
Source dir: ../data/preprocessed
Target dir: ../data/extracted
Processing trainset
Processing devset
Processing testset
Traceback (most recent call last):
  File "paragraph_extraction.py", line 197, in <module>
    paragraph_selection(sample, mode)
  File "paragraph_extraction.py", line 111, in paragraph_selection
    status = dup_remove(doc)
  File "paragraph_extraction.py", line 66, in dup_remove
    if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'
Traceback (most recent call last):
  File "paragraph_extraction.py", line 197, in <module>
    paragraph_selection(sample, mode)
  File "paragraph_extraction.py", line 111, in paragraph_selection
    status = dup_remove(doc)
  File "paragraph_extraction.py", line 66, in dup_remove
    if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'

So, I New a pull request #45 to modify this bug.

HongyuLi2018 commented 5 years ago

Could you please describe the environment that you use? The command "sh run.sh --para_extraction" has been tested with python2.7.13, and we did not see the error message.