Open darkestfloyd opened 6 years ago
chi2 for feature selection
I asked for clustering
cant use dbscan
I'm trying autoencoder
I used gmm. 0.63 hm
You used GMM for feature selection?
gmm for clustering, pca for reduction
shift incorrect cluster labels to majority cluster. got 6% increase
I'm getting 10% for weighted average
lol, ill commit my notebook
just make sure you dont copy anything across!
How the fuck to read this text file? DO i parse it?
ng_list = []
for line in ng: temp_dict = {}
try:
#print(line.split(',')[1])
#print(line.split(',')[0].split("[")[1])
print(line.split(',')[2].split("]")[0])
temp_dict['topic'] = line.split(',')[1]
temp_dict['id'] = line.split(',')[0].split("[")[1]
temp_dict['text'] = (' ').join(line.split(',')[2:])
except Exception as e: print (line)
ng_list.append(temp_dict)
ng_df = pd.DataFrame(ng_list)
What is the shape of your countvect transformed matrix?
(18846, 1000)
I'm not getiing it, what di you pass for the vectorizer?
Arent there only 4k lines in sample text?
vectorize on full data, then pick only the 4k rows
Want me to post the code?
Are you done?
ill push both my notebooks in 10 mins, anything that has an empty comment on top of the block, use that.
Only doubts, no cheating!