darkestfloyd commented 6 years ago

Only doubts, no cheating!

darkestfloyd commented 6 years ago

chi2 for feature selection

DeekshaD commented 6 years ago

I asked for clustering

darkestfloyd commented 6 years ago

cant use dbscan

Omairss commented 6 years ago

I'm trying autoencoder

DeekshaD commented 6 years ago

I used gmm. 0.63 hm

Omairss commented 6 years ago

You used GMM for feature selection?

DeekshaD commented 6 years ago

gmm for clustering, pca for reduction

darkestfloyd commented 6 years ago

shift incorrect cluster labels to majority cluster. got 6% increase

Omairss commented 6 years ago

I'm getting 10% for weighted average

darkestfloyd commented 6 years ago

lol, ill commit my notebook

darkestfloyd commented 6 years ago

just make sure you dont copy anything across!

DeekshaD commented 6 years ago

How the fuck to read this text file? DO i parse it?

Omairss commented 6 years ago

ng_df = pd.DataFrame(columns = ['id', 'topic', 'text'])

ng_list = []

for line in ng: temp_dict = {}

try: 
    #print(line.split(',')[1])
    #print(line.split(',')[0].split("[")[1])
    print(line.split(',')[2].split("]")[0])

    temp_dict['topic'] = line.split(',')[1]
    temp_dict['id']  = line.split(',')[0].split("[")[1]
    temp_dict['text'] = (' ').join(line.split(',')[2:])

except Exception as e: print (line)

ng_list.append(temp_dict)

ng_df = pd.DataFrame(ng_list)

DeekshaD commented 6 years ago

What is the shape of your countvect transformed matrix?

darkestfloyd commented 6 years ago

(18846, 1000)

DeekshaD commented 6 years ago

I'm not getiing it, what di you pass for the vectorizer?

DeekshaD commented 6 years ago

Arent there only 4k lines in sample text?

darkestfloyd commented 6 years ago

vectorize on full data, then pick only the 4k rows

Omairss commented 6 years ago

Want me to post the code?

DeekshaD commented 6 years ago

Are you done?

darkestfloyd commented 6 years ago

ill push both my notebooks in 10 mins, anything that has an empty comment on top of the block, use that.

darkestfloyd / CS6220

Final Exam #1

ng_df = pd.DataFrame(columns = ['id', 'topic', 'text'])