Kyubyong / wordvectors

Pre-trained word vectors of 30+ languages
MIT License
2.22k stars 393 forks source link

First step of workflow isn't specific enough #4

Open webmaven opened 7 years ago

webmaven commented 7 years ago

From the README:

STEP 1. Download the wikipedia database backup dumps of the language you want.

However, the database backup dumps come in many flavors with different data (types of objects, metadata, logs, edit history, etc.) included.

Exactly which of these backup files is supposed to be downloaded?

Kyubyong commented 7 years ago

I used files whose name is like enwiki-20170620-pages-articles-multistream.xml.bz2.