Open XinyiJi1 opened 4 years ago
I also struggle with implementing the HVDM and HVM methods, particularly due to this required y. Is the y required to determine the number of output classes and their labels? And should this y hence correspond to the index in the X array of categorical output labels (label encoded)?
Hi, thanks all for the response and sorry that I left this issue hanging. The implementation of the VDM seems to be incorrect when I look at the code now.
I think the y_ix was supposed to indicate the feature columns that contain categorical variables. It's been a while since I read this paper so this implementation might be wrong?
Anyway, I will gladly welcome any kind of PR for this model. I don't have much time unfortunately to work on this package because of other projects but I will be glad to help and discuss potential improvements!
I have a little problem with the code in VDM.py. I am wondering why we need y_ix here: And how this attribute is related to the VDM function in the paper: since it seems like the c in the VDM function is the number of clusters, a is the attribute where x and y located.