dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
399 stars 216 forks source link

gazetteer_example.py error occurred at the active learning step #130

Open JenkaiMiao opened 1 year ago

JenkaiMiao commented 1 year ago

Below are the error messages: Environment: Python 3.7


ValueError Traceback (most recent call last) /tmp/ipykernel_5240/2454607948.py in 35 # print('starting active labeling...') 36 ---> 37 dedupe.convenience.console_label(gazetteer) 38 39 gazetteer.train()

~/anaconda3/envs/conda_python37_env/lib/python3.7/site-packages/dedupe/convenience.py in console_label(deduper) 148 try: 149 if not unlabeled: --> 150 unlabeled = deduper.uncertain_pairs() 151 152 record_pair = unlabeled.pop()

~/anaconda3/envs/conda_python37_env/lib/python3.7/site-packages/dedupe/api.py in uncertain_pairs(self) 1166 self.active_learner is not None 1167 ), "Please initialize with the prepare_training method" -> 1168 return [self.active_learner.pop()] 1169 1170 def mark_pairs(self, labeled_pairs: TrainingData) -> None:

~/anaconda3/envs/conda_python37_env/lib/python3.7/site-packages/dedupe/labeler.py in pop(self) 329 330 prob_l = [learner.candidate_scores() for learner in self._learners] --> 331 probs = numpy.concatenate(prob_l, axis=1) 332 333 # where do the classifers disagree?

<__array_function__ internals> in concatenate(*args, **kwargs) ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 4998 and the array at index 1 has size 4996
manusturla commented 1 year ago

I had the same error while running the record_linkage_example on Python 3.10.6. The error appears when responding to the second question in the active learning phase.

The Traceback is:

Traceback (most recent call last):
  File "/mnt/74225F01225EC82E/Archivos/FIUBA/Trabajo Profesional/dedupe_examples_[exclude]/record_linkage_example/record_linkage_example.py", line 140, in <module>
    dedupe.console_label(linker)
  File "/mnt/74225F01225EC82E/Archivos/FIUBA/Trabajo Profesional/dedupe_examples_[exclude]/record_linkage_example/venv/lib/python3.10/site-packages/dedupe/convenience.py", line 150, in console_label
    unlabeled = deduper.uncertain_pairs()
  File "/mnt/74225F01225EC82E/Archivos/FIUBA/Trabajo Profesional/dedupe_examples_[exclude]/record_linkage_example/venv/lib/python3.10/site-packages/dedupe/api.py", line 1168, in uncertain_pairs
    return [self.active_learner.pop()]
  File "/mnt/74225F01225EC82E/Archivos/FIUBA/Trabajo Profesional/dedupe_examples_[exclude]/record_linkage_example/venv/lib/python3.10/site-packages/dedupe/labeler.py", line 331, in pop
    probs = numpy.concatenate(prob_l, axis=1)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 4998 and the array at index 1 has size 4996