ai-se / ZheYu

Zhe's works
0 stars 0 forks source link

SMOTE #3

Open azhe825 opened 8 years ago

azhe825 commented 8 years ago

"Original"

def smote(data,num,k=5):
    corpus=[]
    nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree').fit(data)
    distances, indices = nbrs.kneighbors(data)
    for i in range(0,num):
        mid=randint(0,len(data)-1)
        nn=indices[mid,randint(1,k)]
        datamade=[]
        for j in range(0,len(data[mid])):
            gap=random()
            datamade.append((data[nn,j]-data[mid,j])*gap+data[mid,j])
        corpus.append(datamade)
    corpus=np.array(corpus)
    return corpus

"Why not"

def smote(data,num,k=5):
    corpus=[]
    nbrs = NearestNeighbors(n_neighbors=k+1, algorithm='ball_tree').fit(data)
    distances, indices = nbrs.kneighbors(data)
    for i in range(0,num):
        datamade=[]
        for j in range(0,len(data[mid])):
            mid=randint(0,len(data)-1)
            nn=indices[mid,randint(1,k)]
            gap=random()
            datamade.append((data[nn,j]-data[mid,j])*gap+data[mid,j])
        corpus.append(datamade)
    corpus=np.array(corpus)
    return corpus
timm commented 8 years ago

i see the random() call in the inner most loop of the second version... which seems right to me.

azhe825 commented 8 years ago

Difference is: Original version picks a point and combines it with only one of its neighbors to generate the new point; while second version picks a point and combines it with all its neighbors to generate the new point.

I have not finished reading yet. Maybe someone has tested this already. If not, I will come back to this and test it myself.

timm commented 8 years ago

uh huh. would not picking the one point that is your nearest neighbor be safest? region of least disagreement?

more generally, you will find 100 variations of the basic algorithms as you go. the issue will often be that if most of your variants will result in insignificantly different performance to the other methods.

right now, i think your task should be to code up smote as it was originally designed

t

azhe825 commented 8 years ago

Yes, right now all my code of smote is as the original version.