ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Jun-02-2016 #5

Closed azhe825 closed 7 years ago

azhe825 commented 8 years ago

Things done

  1. Target Problem:
    • sampling bias --> imbalance problem in active learning (results delta can be more significant)
  2. Existing literature on solving imbalance problem in active learning scenario
  3. What existing literature lack
    • all methods focus on how to select training examples for next generation.
    • all methods assume that we already have a initially labeled training set. (Except for Hierarchical sampling for active learning. The problem for this method is that it totally abandoned the good nature of active learning.)
    • our assumptions:
        1. imbalance in initial training set will affect the active learning performance
        1. Hierarchical clustering can balance the initial training set
        1. For new stages, we need to consider expert knowledge. e.g. keyword search through elasticsearch first to retrieve a more balanced initial training set.
  4. Negative Results (on multi-classification problem)

The entropy maximization methods do not make a single difference from random sampling!!!

result

result

result

To Do

  1. reduce the problem to binary classification, target class is minority.
  2. if still does not work out, consider keywork search.