ai-se / SMOTE

0 stars 0 forks source link

redesign tuner for future deployment #7

Closed WeiFoo closed 8 years ago

WeiFoo commented 8 years ago

To consider using other algorithms other than DE, like gale, a new tuning interface should be provided and designed.

WeiFoo commented 8 years ago

Done!

timm commented 8 years ago

u got de and gale etc in JMOO?

WeiFoo commented 8 years ago

Haven't used JMOO at this time, will write a wrapper to use gale if necessary. Let's see whether tuning smote works or not.

timm commented 8 years ago

hey @WeiFoo, i don't actually get why we DONT have DE+SMOTE results yet. given @azhe825's rig it should be fast++ to check this out....

WeiFoo commented 8 years ago

Smote once is faster, smote at least 10_10_10 with early termination is another story.

Tuning smote is most time consuming task. I simplified the experiment and now it's been running on HPC more than 16 hours, for only one small date set called anime.txt

HPC is also not that faster at all. And I run the same experiment on my lab top for 6 hours, still need 4 hours based on the logs.

By the way, I rewrite 90% codes. Takes time.

On Nov 30, 2015, at 16:19, Tim Menzies notifications@github.com wrote:

hey @WeiFoo, i don't actually get why we DONT have DE+SMOTE results yet. given @azhe825's rig it should be fast++ to check this out....

— Reply to this email directly or view it on GitHub.

timm commented 8 years ago

By the way, I rewrite 90% codes. Takes time.

acknowledged.

fyi- if smote tuning is soooo slow and data mining tuning is soooo fast then maybe the conclusion here is tune data miners, not pre-processor

on the other hand: why is smote slow? is it the NN calculations? if you do find east1,west1 of the top level of WHERE, then used y=(a^2-x^2) to give each point a y-axis value you could quickly divide the data into a 2d grid. then you divide each dimension into 16 (so now you have 16^2 buckets) and for each bucket, just keep 5 examples of each class (selected at random). so if you want to smote something, use east1,west1 to find its bin then pick any one of the 5 in that bin.

WeiFoo commented 8 years ago

The reason is that each evaluation in tuning smote requires generating new data, fitting learner, predicting, and F/pd/precision value calculation. I explain it here ==>https://github.com/ai-se/SMOTE/issues/9