dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
224 stars 50 forks source link

How do I generate textWithAnchorsFromAllWikipedia2014Feb.txt files? #18

Closed LMY-nlp0701 closed 5 years ago

LMY-nlp0701 commented 5 years ago

Hi, Sorry to bother you again.

I want to generate Latest textWithAnchorsFromAllWikipedia documents

I noticed that you provided WikiExtractro.py to generate the above documents. image

After reading the code, I want to ask how to get the input wiki file of the code, that is: the wikipedia dump file

image

What else should we pay attention to about the six parameters of the function?

Thank you for your reply. Thank you!

octavian-ganea commented 5 years ago

Hi. The textWithAnchorsFromAllWikipedia file was generated using https://github.com/attardi/wikiextractor and the --links option to preserve hyperlinks. Hope it helps.