dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

Update examples #46

Closed mdaniloff closed 7 years ago

mdaniloff commented 7 years ago

Great lib. Could you please update the examples? deduper.sample seems not working for versions >1.5.3 Thanx

fgregg commented 7 years ago

which example is not working? @mdaniloff

On Wed, Dec 14, 2016 at 8:35 PM, mdaniloff notifications@github.com wrote:

Great lib. Could you please update the examples? deduper.sample seems not working for versions >1.5.3 Thanx

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datamade/dedupe-examples/issues/46, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbaOw1q9kcCjvcAqdG4mjGuk5tnqmks5rIKdwgaJpZM4LNpvq .

-- 773.888.2718

mdaniloff commented 7 years ago

in pgsql_big_dedupe_example.py when I run sample of the records deduper.sample(temp_d, 75000)

Traceback (most recent call last): File "C:\projects\gitlibs\dedupe-examples-master\pgsql_big_dedupe_example\pgsql_big_dedupe_example.py", line 135, in deduper.sample(temp_d, 75000) File "C:\Python27\lib\site-packages\dedupe\api.py", line 832, in sample self.active_learner.sample_combo(data, blocked_proportion, sample_size) File "C:\Python27\lib\site-packages\dedupe\labeler.py", line 148, in sample_combo super(RLRLearner, self).sample_combo(*args) File "C:\Python27\lib\site-packages\dedupe\labeler.py", line 47, in sample_combo in blocked_sample_keys | random_sample_keys] KeyError: 4294966056

fgregg commented 7 years ago

Hmm.. can't reproduce. Could you tell me what the maximum integer is for your python http://stackoverflow.com/questions/7604966/maximum-and-minimum-values-for-ints

mdaniloff commented 7 years ago

2147483647

stucka commented 7 years ago

I thought that sounded familiar. Per Wikipedia: "The number 2,147,483,647 (or hexadecimal 7FFF,FFFF16) is the maximum positive value for a 32-bit signed binary integer in computing"

32-bit Postgres?

fgregg commented 7 years ago

I think I fixed this here: https://github.com/datamade/dedupe/commit/6528b75605dc5e3ddccc6f8054ba2f314f5a5194

Could @mdanlioff or @stucka confirm?

fgregg commented 7 years ago

err @mdaniloff