dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

ValueError: embedded null character #102

Closed deepesch closed 4 years ago

deepesch commented 4 years ago

Hi, I'm getting the following error inconsistently while using mysql_example.py. Could you please help?

ValueError: embedded null character

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "beta_clustering.py", line 242, in detecter.prepare_training(temp_d) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/api.py", line 806, in prepare_training self.sample(data, sample_size, blocked_proportion, original_length) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/api.py", line 838, in sample index_include=examples) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/labeler.py", line 415, in init index_include) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/labeler.py", line 243, in init index_data) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/training.py", line 119, in init self.blocker.indexAll(data) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/blocking.py", line 97, in indexAll self.index(unique_fields, field) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/blocking.py", line 64, in index index.index(preprocess(doc)) File "/data/fingerprints/ml/lib64/python3.6/site-packages/dedupe/levenshtein.py", line 15, in index Levenshtein_search.add_string(self.index_key, doc) SystemError: returned a result with an error set

fgregg commented 4 years ago

can't reproduce, can you make a smaller test case.