ajitrajasekharan / bert_vector_clustering

Clustering learned BERT vectors for downstream tasks like unsupervised NER, unsupervised sentence embeddings etc.
MIT License
10 stars 5 forks source link

error during ./run.sh #1

Open bhomass opened 4 years ago

bhomass commented 4 years ago

I executed the run.sh step, and got an error inquired 0.51 0.0 ['inquired', 'asks'] Processing 28118 of 28996 ***Singleton arr for term: MacKenzie

Has anyone seen this before? the debug_pivots.txt file size currently sits at 554231.

What is the correct size of debug_pivots.txt, when run to completion?

ajitrajasekharan commented 4 years ago

For bert-large-cased model I got an output of -rw-rw-r-- 1 deepcompute deepcompute 554073 Mar 8 02:43 debug_pivots.txt and for wc -l debug_pivots.txt 6112 debug_pivots.txt What is the model being used to cluster on?

bhomass commented 4 years ago

yes, it was bert-large-cased . looks like my debug_pivots.txt is one line longer than yours.

6113 debug_pivots.txt

ajitrajasekharan commented 4 years ago

Mine had the singleton array clusters removed. That could explain the difference. So the generation appears to have completed. Not sure what the error is. I will try , duplicate and get back