Colin-Codes / IntentClassifier-ML-Project

Pyhton, Keras, SciKit-Learn, Matplotlib: Machine learning research project around classification of intent behind tech support emails in order to enable automatic follow up.
0 stars 0 forks source link

Alternatives to BoW in KNN - TFIDF? #34

Open Colin-Codes opened 4 years ago

Colin-Codes commented 4 years ago

Probably TF-IDF, but Word embedding would be ideal

Colin-Codes commented 4 years ago

https://towardsdatascience.com/3-basic-approaches-in-bag-of-words-which-are-better-than-word-embeddings-c2cbc7398016

Colin-Codes commented 4 years ago

Phrase embeddings:

https://towardsdatascience.com/fse-2b1ffa791cf9

Colin-Codes commented 4 years ago
  1. More meaningful alternatives to the 'Bag of Words' approach: I will probably implement an improvement of count occurrence like TF-IDF. I would like to experiment with using phrase embeddings in KNNs - although I have not found very much advice on this so far, and need to conduct more research. I would welcome you advice on this, if you have any suggestions?
Colin-Codes commented 4 years ago

suggested research from Giseli:

https://www.researchgate.net/publication/326425709_Text_Mining_Use_of_TF-IDF_to_Examine_the_Relevance_of_Words_to_Documents