Closed zhaoluffy closed 6 years ago
They were generated from the Wikipedia dump, but the code for this was not included here.
Hi, Thanks for your reply, Could you share the idea of generating the file 'wiki_disambiguation_pages.txt'
Honestly, this was generated a long time ago by one student and I don't have the code. But, if I recall well, this file contains all list pages, all pages with a (disambiguation) in the title, all pages in the Category:Disambiguation_pages category and all pages that start with "X may refer to".
I See, Thank you very much
Dear author,
I am sorry for re-opening this issue, but it seemed unproductive to open a new issue with the same type of questions.
I am currently processing a new Wikipedia dump, meaning that I have to obtain the files mentioned above. To achieve this, I am using your textWithAnchorsFromAllWikipedia2014Feb to generate the remainder of the files. Once these files are equal to the ones you have provided, then I will use my own dump which, I read, can be obtained by using WikiExtractor. I am, however, unsure about the following and hope you give me some insight into this:
Hi, These source files of 'wiki_redirects.txt', 'wiki_name_id_map.txt' and 'wiki_disambiguation_pages.txt' are generated or downloaded?