Accuracy issues - Githubissues

usmaann commented 6 years ago

Hi

I tried ConceptNet Numberbatch pre trained embedding on CNN classification task and compared the results Glove and Word2vec results.

Results of Word2vec and Glove are still better than ConceptNet embedding. I was expecting to get better accuracy results from numberbatch,

Any advise ? if i am doing anything wrong?

rspeer commented 6 years ago

Well, I can't guarantee our results will always be better, but a couple of things to check:

Are you seeing a high proportion of out-of-vocabulary words? It could be that the word forms aren't the same as the words you encounter in your task.
When there are OOV words, what do you do with them? In our evaluations, we use a strategy for OOV words that includes looking up their neighbors in ConceptNet, so that the number of word vectors we need to distribute in Numberbatch is smaller. Some sort of OOV strategy is important; replacing them all with "unk" will not provide common sense knowledge.

Also, recent results I've seen where common sense background knowledge has performed well has used it actively as part of the training process, not just in pre-training (where you'd hope that your function would find a local minimum for your training data regardless of where it started).

usmaann commented 6 years ago

Hi thanks for your quick feedback.

I have also tried to use ConceptNet Numberbatch in the training process as well but the results are almost the same as Glove and Word2vec.

Am I having a wrong expectation? My expectation is that word embedding with a combination of Knowledgebase ( Concept Net Numberbatch) should give better accuracy results instead of using word embedding alone like Glove or Word2vec?

rspeer commented 6 years ago

There are definitely published results indicating that the information in ConceptNet provides the best results on some tasks. Recent ones include:

Machine Comprehension using Commonsense Knowledge (http://www.aclweb.org/anthology/S18-1119)
87.6% on the Story Cloze Test (https://arxiv.org/pdf/1811.00625.pdf), which I believe is the current state of the art, surpassing OpenAI Transformer

However: distributional models have gotten better and more sophisticated since Numberbatch. It is probably no longer as simple as dropping in Numberbatch as a replacement, especially if the vocabulary we distribute is wrong for your task.

It will probably require more sophistication to incorporate ConceptNet into current models as well. Sorry to hear that it's not helping for your task so far.

(I hope it's okay to respond just here, and not also on the mailing list.)

usmaann commented 6 years ago

Hi, Thanks again for your detailed feedback.

Any possible way you advise to do to get maximum benefit of NumberBatch( conceptNet)

Currently I am using numberbatch as a pre-trained embedding and on top of that you CNN model for sentence embedding and then softmax > calculate accuracy

commonsense / conceptnet-numberbatch

Accuracy issues #51