input data for preprocess

This is written in red.

Hi @itaim, Thanks for your interest in our work!

You have two options:

[Option 1] Preprocess your own training data using preprocess_wiki.py. The data needs to be in a csv file with 3 columns such as follows:

article's title	article's top-n words	article's first couple of sentences
window decoration	window titlebar managers bar buttons ...	In graphical user interfaces, the window decoration is ...

please refer to the paper for details how we extracted this training data from wikipedia. The resulted in processed data will be stored in data/wiki_tfidf/ or data/wiki_sent/.

[Option 2] Use the already provided processed data in data/wiki_tfidf and data/wiki_sent to train and generate labels for your topics.

Hope this makes it clearer.

Areej

areejokaili / topic_labelling

input data for preprocess #10