ajitrajasekharan / bert_vector_clustering

Clustering learned BERT vectors for downstream tasks like unsupervised NER, unsupervised sentence embeddings etc.
MIT License
10 stars 5 forks source link

Expecting different labels.txt #2

Open dmoti opened 2 years ago

dmoti commented 2 years ago

I'm getting the following error running run.sh:

Tokenize is set to : False count of tokens in vocab.txt : 28996 Invalid line: ['GLU', 'the', 'the', '0.6', '0.06'] Traceback (most recent call last): File "dist_v2.py", line 878, in main() File "dist_v2.py", line 835, in main b_embeds =BertEmbeds(sys.argv[1],sys.argv[2],sys.argv[3],sys.argv[4],True,True,sys.argv[6],sys.argv[7],sys.argv[8],sys.argv[9],sys.argv[10]) #True - for cache embeds; normalize - True File "dist_v2.py", line 160, in init self.labels_dict,self.lc_labels_dict = read_labels(labels_file) File "dist_v2.py", line 98, in read_labels assert(0) AssertionError

for some reason the code expects to see 3 items in each line of labels.txt but there are 5