ClimbsRocks / machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
https://github.com/ClimbsRocks/auto_ml
408 stars 64 forks source link

FUTURE: train algos only on the difficult cases #112

Open ClimbsRocks opened 8 years ago

ClimbsRocks commented 8 years ago

this is a far future idea.

when we have disagreement between different algorithms, that is a difficult case.

when we have disagreement, mark it as such. aggregate all the samples the algos disagree on. then, run only these disagreed items through machineJS again.

several different ways of calculating disagreement, depending on problemType: classification:

  1. when more than X% of all algos disagree with the majority prediction

regression:

  1. when the predicted value varies between algos by Y% (one predicts a value of 100, the next predicts a value of 70). calculate this based on the spread in predicted values. a difference of 30 is trivial if the spread is 100,000,000, but huge if the spread is only 100.

mark each sample that the algos disagree on as such, and feed that information (this is a disagreement case) back into the algos. that way everyone knows this is a special case, and can handle it appropriately.