gambit4348 / deception-detection-review-2022

Deception Detection with Machine Learning: a literature review and statistical analisys
MIT License
5 stars 0 forks source link
deception-detection literature-review statistical-analysis

deception-detection-review-2022

Deception Detection with Machine Learning: a literature review and statistical analisys

Literature review

Introduction

The files present in this repository are part of the Literature Review Project and aim to disclose the data collected along the process, and work as a memory of all the steps taken as well.

Currently, the manuscript of the scientific article that discusses the consequences and findings of the Literature Review was submitted to the Scientific Journal PLOS One (https://journals.plos.org/plosone/) and is waiting for a response from the peer reviewers.

Research team and contribution

Conceptualization

  1. Alex Sebastião Constâncio
  2. Denise Fukumi Tsunoda
  3. Deborah Ribeiro Carvalho

    Data curation

  4. Alex Sebastião Constâncio
  5. Denise Fukumi Tsunoda

    Formal analysis

  6. Alex Sebastião Constâncio

    Investigation

  7. Alex Sebastião Constâncio
  8. Denise Fukumi Tsunoda

    Methodology

  9. Alex Sebastião Constâncio
  10. Deborah Ribeiro Carvalho
  11. Helena de Fátima Nunes Silva
  12. Jocelaine Martins da Silveira

    Software

  13. Alex Sebastião Constâncio

    Writing – original draft

  14. Alex Sebastião Constâncio

    Writing – review and editing

  15. Deborah Ribeiro Carvalho
  16. Denise Fukumi Tsunoda
  17. Helena de Fátima Nunes Silva
  18. Jocelaine Martins da Silveira

    Supervision

  19. Deborah Ribeiro Carvalho
  20. Helena de Fátima Nunes Silva

    Research scope and objectives

    1. Research goals

    The goal of this literature review is to capture a panoramic view of the state of research on Deception Detection supported by Machine Learning, in order to be able to understand trends, results and gaps on the field.

    2. Research questions

a. What are the best-performing Machine Learning techniques applied to automatic deception detection?

b. What are the datasets and features they consume?

c. What level of performance have they reached recently?

3. Research restrictions

  1. Period of interest is 2011-2021;
  2. Only non-invasive methods and techniques will be reviewed; by non-invasive, we mean methods that absolutely do not touch the subject nor submit him/her to be evaluated by an equipment less mobile then a regular computer;
  3. Only studies that report some kind of performance level achieved.

    4. Research protocol

  4. Run queries on selected scientific document bases:
  5. Export results as BibTeX files
  6. Import all BibTeX files into BiblioAlly; those documents are tagged as "IMPORTED" or "DUPLICATE"
  7. Manually detect duplications not detected during import and tag them as "DUPLICATE"
  8. Pre-select articles by shallow screening:
  9. Retrieve the full-text of pre-selected documents
  10. Select articles by deep screening
  11. Extract relevant data from accepted documents
  12. Run a meta-analysis and generate charts and tables

    5. Data extraction

    After reading the full text of selected papers, each were summarized in two forms:

  13. Mind map: a graphical summarized form of the study;
  14. Python dictionary: an encoded version of the extracted meta-data of interest that can be further computed to produce statistics, charts and tables. Details on each one below.

    6. Mind maps

    Mind maps are FreeMind documents, manually produced, since BiblioAlly still can't do it automatically (for now we can dream about it, right?). Those mind maps were built to serve as a quick and short summary of the entire article and helped during reading and reviewing their full text. Those maps describe the study hypothesis, the contributions, the dataset, the feature modalities, the methods used, and the performance achieved.

    7. Meta-data encoding

    Each article was structured as follows:

  15. document_id: the document id in the BiblioAlly database;
  16. methods: list of methods and tools used in the paper, each item is described as classifier or support:
  17. classifier: describes the classification algorithm as:
  18. kind: when appliable, describes some kind or sub-category of the method;
  19. implementation: package used as algorithm implementor;
  20. performance: performance achieved by the classifier described as:
  21. kind: the performance measure used;
  22. value: the performance level achieved;
  23. support: describes supporting tools used for some generic purpose;
  24. dataset: description of the dataset used in the study:
  25. public: True indicates a freely accessible dataset, False the opposite;
  26. mock: True indicates a dataset collected from some fabricated setting, False means data collected from real-life events;
  27. name: name of the dataset;
  28. size: number of rows listed in the dataset;
  29. origin: source of the data;
  30. target: labels used in the target attribute;
  31. features: list of feature kinds in the dataset:
  32. kind: the kind of detection cue features;
  33. dimensions: the number of features;
  34. components: list of feature components;
  35. language: list of languages, when appliable;
  36. tool: list of tools, when appliable;
  37. notes: textual notes about the study;
  38. mindmap: file name of the mind map document.