deception-detection-review-2022

Deception Detection with Machine Learning: a literature review and statistical analisys

Literature review

Introduction

The files present in this repository are part of the Literature Review Project and aim to disclose the data collected along the process, and work as a memory of all the steps taken as well.

Currently, the manuscript of the scientific article that discusses the consequences and findings of the Literature Review was submitted to the Scientific Journal PLOS One (https://journals.plos.org/plosone/) and is waiting for a response from the peer reviewers.

Research team and contribution

Conceptualization

Alex Sebastião Constâncio
Denise Fukumi Tsunoda
Deborah Ribeiro Carvalho
Data curation
Alex Sebastião Constâncio
Denise Fukumi Tsunoda
Formal analysis
Alex Sebastião Constâncio
Investigation
Alex Sebastião Constâncio
Denise Fukumi Tsunoda
Methodology
Alex Sebastião Constâncio
Deborah Ribeiro Carvalho
Helena de Fátima Nunes Silva
Jocelaine Martins da Silveira
Software
Alex Sebastião Constâncio
Writing – original draft
Alex Sebastião Constâncio
Writing – review and editing
Deborah Ribeiro Carvalho
Denise Fukumi Tsunoda
Helena de Fátima Nunes Silva
Jocelaine Martins da Silveira
Supervision
Deborah Ribeiro Carvalho
Helena de Fátima Nunes Silva
Research scope and objectives

1. Research goals

The goal of this literature review is to capture a panoramic view of the state of research on Deception Detection supported by Machine Learning, in order to be able to understand trends, results and gaps on the field.

2. Research questions

a. What are the best-performing Machine Learning techniques applied to automatic deception detection?

b. What are the datasets and features they consume?

c. What level of performance have they reached recently?

3. Research restrictions

Period of interest is 2011-2021;
Only non-invasive methods and techniques will be reviewed; by non-invasive, we mean methods that absolutely do not touch the subject nor submit him/her to be evaluated by an equipment less mobile then a regular computer;
Only studies that report some kind of performance level achieved.
4. Research protocol
Run queries on selected scientific document bases:
Export results as BibTeX files
Import all BibTeX files into BiblioAlly; those documents are tagged as "IMPORTED" or "DUPLICATE"
Manually detect duplications not detected during import and tag them as "DUPLICATE"
Pre-select articles by shallow screening:
Retrieve the full-text of pre-selected documents
Select articles by deep screening
Extract relevant data from accepted documents
Run a meta-analysis and generate charts and tables
5. Data extraction

After reading the full text of selected papers, each were summarized in two forms:
Mind map: a graphical summarized form of the study;
Python dictionary: an encoded version of the extracted meta-data of interest that can be further computed to produce statistics, charts and tables. Details on each one below.
6. Mind maps

Mind maps are FreeMind documents, manually produced, since BiblioAlly still can't do it automatically (for now we can dream about it, right?). Those mind maps were built to serve as a quick and short summary of the entire article and helped during reading and reviewing their full text. Those maps describe the study hypothesis, the contributions, the dataset, the feature modalities, the methods used, and the performance achieved.

7. Meta-data encoding

Each article was structured as follows:
document_id: the document id in the BiblioAlly database;
methods: list of methods and tools used in the paper, each item is described as classifier or support:
classifier: describes the classification algorithm as:
kind: when appliable, describes some kind or sub-category of the method;
implementation: package used as algorithm implementor;
performance: performance achieved by the classifier described as:
kind: the performance measure used;
value: the performance level achieved;
support: describes supporting tools used for some generic purpose;
dataset: description of the dataset used in the study:
public: True indicates a freely accessible dataset, False the opposite;
mock: True indicates a dataset collected from some fabricated setting, False means data collected from real-life events;
name: name of the dataset;
size: number of rows listed in the dataset;
origin: source of the data;
target: labels used in the target attribute;
features: list of feature kinds in the dataset:
kind: the kind of detection cue features;
dimensions: the number of features;
components: list of feature components;
language: list of languages, when appliable;
tool: list of tools, when appliable;
notes: textual notes about the study;
mindmap: file name of the mind map document.

gambit4348 / deception-detection-review-2022

readme

deception-detection-review-2022

Deception Detection with Machine Learning: a literature review and statistical analisys

Literature review

Introduction

Research team and contribution

Conceptualization

Data curation

Formal analysis

Investigation

Methodology

Software

Writing – original draft

Writing – review and editing

Supervision

Research scope and objectives

1. Research goals

2. Research questions

3. Research restrictions

4. Research protocol

5. Data extraction

6. Mind maps

7. Meta-data encoding