dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
406 stars 214 forks source link

csv example seems to be restarting itself? #18

Closed xykev closed 9 years ago

xykev commented 9 years ago

This time I am running the csv example on a different computer. I am seemingly able to complete the dedupe.consoleLabel(deduper) portion, but after finishing the program appears to restart itself four times. I get the following messages beginning with my last labelling:

(y)es / (n)o / (u)nsure / (f)inished f Finished labeling importing data ... importing data ... importing data ... importing data ... C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74503 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74659 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74657 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74539 % (sample_size, len(blocked_sample))) starting active labeling... Phone : 3484629 Address : 1439 west wellington Zip : 60657 Site name : alphonsus academy & center for the arts

Phone : 8716780 Address : 524 w melrose avenue Zip : 60657 Site name : florence heller jewish community center

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished starting active labeling... starting active labeling... Phone : 2547700 Address : 1711 w 35th st Zip : Site name : el valor - little tykes i

Phone : 3730234 Address : 30 w. garfield Zip : 60609 Site name : ounce -garfield head start

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished Phone : 5356600 Address : 7736 s. burnham Zip : 60649 Site name : chicago public schools bradwell, myra

Phone : 2214442 Address : 1845 e 79th st. Zip : 60649 Site name : henry booth house wee care nursery

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished starting active labeling... Phone : 7227440 Address : 212 south francisco Zip : 60612 Site name : marillac social center

Phone : 6666726 Address : 600 n leavitt street Zip : 60612 Site name : onward neighborhood house

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished

Anybody have any idea why this could be happening?

fgregg commented 9 years ago

Unfortunately, I know very little about how multiprocessing works on windows. Try disabling multiprocessing by setting the num_cores argument to 1:

deduper = dedupe.Dedupe(fields, num_cores=4)

On Thu Jan 08 2015 at 11:45:45 AM xykev notifications@github.com wrote:

This time I am running the csv example on a different computer. I am seemingly able to complete the dedupe.consoleLabel(deduper) portion, but after finishing the program appears to restart itself four times. I get the following messages beginning with my last labelling:

(y)es / (n)o / (u)nsure / (f)inished f Finished labeling importing data ... importing data ... importing data ... importing data ... C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74503 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74659 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74657 % (sample_size, len(blocked_sample))) C:\Users\Kevin\Anaconda\lib\site-packages\dedupe\sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74539 % (sample_size, len(blocked_sample))) starting active labeling... Phone : 3484629 Address : 1439 west wellington Zip : 60657 Site name : alphonsus academy & center for the arts

Phone : 8716780 Address : 524 w melrose avenue Zip : 60657 Site name : florence heller jewish community center

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished starting active labeling... starting active labeling... Phone : 2547700 Address : 1711 w 35th st Zip : Site name : el valor - little tykes i

Phone : 3730234 Address : 30 w. garfield Zip : 60609 Site name : ounce -garfield head start

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished Phone : 5356600 Address : 7736 s. burnham Zip : 60649 Site name : chicago public schools bradwell, myra

Phone : 2214442 Address : 1845 e 79th st. Zip : 60649 Site name : henry booth house wee care nursery

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished starting active labeling... Phone : 7227440 Address : 212 south francisco Zip : 60612 Site name : marillac social center

Phone : 6666726 Address : 600 n leavitt street Zip : 60612 Site name : onward neighborhood house

Do these records refer to the same thing? (y)es / (n)o / (u)nsure / (f)inished

Anybody have any idea why this could be happening?

— Reply to this email directly or view it on GitHub https://github.com/datamade/dedupe-examples/issues/18.

xykev commented 9 years ago

This seems to have solved the issue, thanks!