J535D165 / recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python
http://recordlinkage.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
966 stars 152 forks source link

How to utilize prob-related methods of ECM classifier #178

Open Ramin1368 opened 2 years ago

Ramin1368 commented 2 years ago

Hi

I am utilizing the ECM classifier as my unsupervised classifier for my problem but I keep getting error while calling them that I do not understand why: ecm.fit(df_feature_vectors) log_m_probablity = ecm.log_m_probs

which gives the following error: ValueError: Expected input with 6 features, got 5 instead while my feature_vector has only 5 features. and also upon using ecm.prob, got the following error: ValueError: Expected input with 11 features, got 5 instead

Interestingly, every time, I run this, it expects 5 more features like expected 16 features, 21 features, .....

what is the solution in order to use these methods such as log_m_probs, prob, log_u_probs, etc.??? Also one more question regarding this is that as I was employing the prodict method: links_pred = ecm.predict(df_feature_vectors) where df_feature_vectors = comparer.compute(All_Index_Pairs, df), it threw error such that the labels had to be either one or zero and I had to use binarizer to make the labels either one or zero in order to avoid the error. why can't the labels be between zero and one?