KacperKubara / distython

Distance metrics which can handle mixed-type data and missing values
58 stars 3 forks source link

y_ix in VDM.py #15

Open XinyiJi1 opened 4 years ago

XinyiJi1 commented 4 years ago

I have a little problem with the code in VDM.py. I am wondering why we need y_ix here: image And how this attribute is related to the VDM function in the paper: image since it seems like the c in the VDM function is the number of clusters, a is the attribute where x and y located.

JeffJochems commented 3 years ago

I also struggle with implementing the HVDM and HVM methods, particularly due to this required y. Is the y required to determine the number of output classes and their labels? And should this y hence correspond to the index in the X array of categorical output labels (label encoded)?

KacperKubara commented 3 years ago

Hi, thanks all for the response and sorry that I left this issue hanging. The implementation of the VDM seems to be incorrect when I look at the code now.

I think the y_ix was supposed to indicate the feature columns that contain categorical variables. It's been a while since I read this paper so this implementation might be wrong?

Anyway, I will gladly welcome any kind of PR for this model. I don't have much time unfortunately to work on this package because of other projects but I will be glad to help and discuss potential improvements!