Fake news: approaching automated lie detection using pre-trained word embeddings

In this repository, the following research question is explored: what is the performance of combinations of pre-trained embedding techniques with machine learning algorithms when classifying fake news? This research will be focussed on applying transfer learning on earlier research by Wang (2017). Results of Wang will be used as a benchmark for performance.

Requirements
Research questions
Results

Requirements

To run the code in the code folder, the following packages must be installed:

flair
allennlp
tensorflow
tensorflow_hub
pytorch
spacy
hypopt
gensim

You can install these packages by running pip install -r /code/requirements.txt.

Research questions

Which way of pooling vectors to a fixed length works best for classifying fake news?

At what padding sequence length do neural networks hold the highest accuracy when classifying fake news?

How well do neural network classification architectures classify fake news compared to non-neural classification algorithms?

Results

With a combination of BERT embeddings and a logistic regression, an accuracy of 52.96% on 3 labels can be achieved, which is an increase of almost 4% compared to previous research in which only traditional linguistic methods were used. On the original 6 labels, this combination achieves an accuracy of 27.51%, which is 0.51% better than the original research by Wang (2017).

MeMartijn / FakeNewsDetection

readme