attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k stars 959 forks source link

about "raise BdbQuit" problem #290

Open zhenjia2017 opened 1 year ago

zhenjia2017 commented 1 year ago

Hi, when I used the command "python -m wikiextractor.WikiExtractor ", after processing the pages, the message "if self.quitting: raise BdbQuit" came out. How to solve the problem? Thanks a lot!

HinPeng commented 1 year ago

@zhenjia2017 Same problem. I comment the https://github.com/attardi/wikiextractor/blob/f0ca16c3e92983b9094b6f32526992fc3a678f8f/wikiextractor/extract.py#L85 and the process can continue then. But I'm not sure if it's the right solution.

zhenjia2017 commented 1 year ago

Thanks a lot. In the code, there is pdb.set_trace() but it still does not work. I have solved it with another version of extract.py.

lalala0731 commented 1 year ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

tonnyaudio commented 1 year ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

Derek-tjhwang commented 1 year ago

Hi,I have the same problem.Can you give me your "extract.py"? Thank you

itcantbetrue commented 1 year ago

I didn't solve the problem either, but I changed the method to extract text from the bz2 file

------------------ 原始邮件 ------------------ 发件人: "attardi/wikiextractor" @.>; 发送时间: 2022年8月16日(星期二) 下午5:22 @.>; @.***>; 主题: Re: [attardi/wikiextractor] about "raise BdbQuit" problem (Issue #290)

Hi,I have the same problem.Can you give me your "extract.py"? Thank you.

ok

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

raflisusanto commented 1 year ago

In extract.py you'll have to comment out both the import pdb and pdb_set_trace(). I just tried it yesterday and it started working.

changtianluckyforever commented 1 year ago

Thanks a lot. In the code, there is pdb.set_trace() but it still does not work. I have solved it with another version of extract.py.

May I know how your change your extract.py to make it to another version? thanks!

changtianluckyforever commented 1 year ago

just commend import PDB and pdf.set_trace(). then output the file as JSON format. now it is working for me. thanks for all the above mates. enjoy your life without bugs. haha.

DengZhirui commented 1 year ago

commend out import PDB and pdb.set_trace(). Then run 'python setup.py install' again. Output as JSON. Now it can work for me.