Open itaim opened 3 years ago
This is written in red.
Hi @itaim, Thanks for your interest in our work!
You have two options:
preprocess_wiki.py
. The data needs to be in a csv file with 3 columns such as follows:article's title | article's top-n words | article's first couple of sentences |
---|---|---|
window decoration | window titlebar managers bar buttons ... | In graphical user interfaces, the window decoration is ... |
please refer to the paper for details how we extracted this training data from wikipedia. The resulted in processed data will be stored in data/wiki_tfidf/
or data/wiki_sent/
.
data/wiki_tfidf
and data/wiki_sent
to train and generate labels for your topics. Hope this makes it clearer.
Areej
Hi!, If I understand correctly from reading the other closed issues in order to run inference on my own set of topics I need to first run preprocess on my topics data file. But preprocess expects
in_data_path='/Users/areej/Desktop/wiki_extract/wiki_title_topn_doc/'
. Where can I get this data from or am I missing something and I can run it without?