about "raise BdbQuit" problem

attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps

GNU Affero General Public License v3.0

3.76k stars 968 forks source link

about "raise BdbQuit" problem #290

Open zhenjia2017 opened 2 years ago

zhenjia2017 commented 2 years ago

Hi, when I used the command "python -m wikiextractor.WikiExtractor ", after processing the pages, the message "if self.quitting: raise BdbQuit" came out. How to solve the problem？ Thanks a lot!

HinPeng commented 2 years ago

@zhenjia2017 Same problem. I comment the https://github.com/attardi/wikiextractor/blob/f0ca16c3e92983b9094b6f32526992fc3a678f8f/wikiextractor/extract.py#L85 and the process can continue then. But I'm not sure if it's the right solution.

zhenjia2017 commented 2 years ago

Thanks a lot. In the code, there is pdb.set_trace() but it still does not work. I have solved it with another version of extract.py.

lalala0731 commented 2 years ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

tonnyaudio commented 2 years ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

Derek-tjhwang commented 2 years ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you

itcantbetrue commented 2 years ago

I didn't solve the problem either, but I changed the method to extract text from the bz2 file

------------------ 原始邮件 ------------------ 发件人: "attardi/wikiextractor" @.>; 发送时间: 2022年8月16日(星期二) 下午5:22 @.>; @.***>; 主题: Re: [attardi/wikiextractor] about "raise BdbQuit" problem (Issue #290)

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

raflisusanto commented 2 years ago

In extract.py you'll have to comment out both the import pdb and pdb_set_trace(). I just tried it yesterday and it started working.

changtianluckyforever commented 2 years ago

Thanks a lot. In the code, there is pdb.set_trace() but it still does not work. I have solved it with another version of extract.py.

May I know how your change your extract.py to make it to another version? thanks!

changtianluckyforever commented 2 years ago

just commend import PDB and pdf.set_trace(). then output the file as JSON format. now it is working for me. thanks for all the above mates. enjoy your life without bugs. haha.

DengZhirui commented 2 years ago

commend out import PDB and pdb.set_trace(). Then run 'python setup.py install' again. Output as JSON. Now it can work for me.