anhaidgroup / py_entitymatching

BSD 3-Clause "New" or "Revised" License
183 stars 48 forks source link

The table I used to train the model was the ACM_DBLP data set, and the tabular data set I evaluated was the DBLP_Scholar data set. Finally, the prediction function reports the following error #141

Open zhouhuhq opened 3 years ago

zhouhuhq commented 3 years ago

Metadata file already exists at D:\web\DATA\end-to-end/GGG.metadata. Overwriting it Traceback (most recent call last): File "d:/workspace/graph_EA/DBLP_Scholar/entitymatching_DS.py", line 104, in append=True, target_attr='predicted', inplace=False) File "C:\Users\周周\AppData\Roaming\Python\Python36\site-packages\py_entitymatching-0.3.2-py3.6-win-amd64.egg\py_entitymatching\matcher\mlmatcher.py", line 239, in predict y = self._predict_ex_attrs(table, exclude_attrs, return_prob=return_probs) File "C:\Users\周周\AppData\Roaming\Python\Python36\site-packages\py_entitymatching-0.3.2-py3.6-win-amd64.egg\py_entitymatching\matcher\mlmatcher.py", line 179, in _predict_ex_attrs res = self._predict_sklearn(x, check_rem=False, return_prob=return_prob) File "C:\Users\周周\AppData\Roaming\Python\Python36\site-packages\py_entitymatching-0.3.2-py3.6-win-amd64.egg\py_entitymatching\matcher\mlmatcher.py", line 137, in _predict_sklearn y = self.clf.predict(x) File "d:\anaconda\envs\pytorch\lib\site-packages\sklearn\tree\tree.py", line 430, in predict X = self._validate_X_predict(X, check_input) File "d:\anaconda\envs\pytorch\lib\site-packages\sklearn\tree\tree.py", line 402, in _validate_X_predict % (self.nfeatures, n_features)) ValueError: Number of features of the model must match the input. Model n_features is 14 and input n_features is 23

The reason seems to be that the feature tables generated by the two data sets are different. But the number of columns in the two data sets is the same. Why does this happen and how to solve it? Thank you