adalmia96 / Cluster-Analysis

54 stars 29 forks source link

Could you provide a sample embedding vocab files to run the socre.py? #10

Open lahpan opened 2 years ago

lahpan commented 2 years ago

Hi,

Thanks for your fascinating work. I am trying to reproduce your results for my own project, but I found it hard to run the socre.py code. In your example call, python3 code/score.py --entities KG --entities_file {dest_to_entities_file} --clustering_algo GMM --dataset reuters --vocab {dest_to_vocab_file} --num_topics 20 50 --doc_info WGT--rerank tf You did not specify the type of embedding and vocab files, so I kept running into problem. Following is the code from your preprocess.py. def create_global_vocab(vocab_files): vocab_list = set(line.split()[0] for line in open(vocab_files[0])) for vocab in vocab_files: vocab_list = vocab_list & set(line.split()[0] for line in open(vocab)) return vocab_list My question is what kind of vocab files(txt, pkl,csv etc.) can have open(vocab_files[0]) ?

It will be helpful you can provide a sample embedding and vocab file so we can run the demo.sh code in the bin.

Thanks in advance for your help.