Vocabulary - Githubissues

dennybritz / cnn-text-classification-tf

Convolutional Neural Network for Text Classification in Tensorflow

Apache License 2.0

5.65k stars 2.77k forks source link

Vocabulary #150

Open antriksh63 opened 6 years ago

antriksh63 commented 6 years ago

I wanted to ask from where can we view the vocabulary of words? There is a vocab file but entries in it are like this- 8003 6374 656e 736f 7266 6c6f 772e 636f 6e74 7269 622e 6c65 6172 6e2e 7079 7468 6f6e 2e6c 6561 726e 2e70 7265 7072 6f63 6573 7369 6e67 2e74 6578 740a 566f 6361

bibhu107 commented 6 years ago

After line 55 of train.py add the following code. It will produce a voca.txt file where there would be words and their id.

vocab_dict = vocab_processor.vocabulary_._mapping
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])

vocabulary = list(list(zip(*sorted_vocab))[0])
file = open("vocab.txt","w")

file.writelines('{}:{}\n'.format(k,v) for k, v in vocab_dict.items())
file.close()

antriksh63 commented 6 years ago

Thanks!

dgyhee commented 6 years ago

The model evaluates sentences by paring a certain ID and ID's of voc in the sentence, right? Then could we separate voca.txt into two parts, such as highly positive voc and highly negative voc?

Alezas commented 3 years ago

Hi, I'm starting in this and my question is, since the network finished training, evaluating, how do I have access to the classes that the CNN found? or is it not possible?