dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

MySQL: plural_key.block_key column is empty, extra column added. #83

Closed jbeales closed 6 years ago

jbeales commented 6 years ago

In the MySQL demo, the block_key column in the plural_key table is always empty, but a new column is added: MIN(block_key). This is because When using CREATE TABLE ... SELECT MySQL "MySQL creates new columns for all elements in the SELECT." link. If we give the columns in the SELECT a name that is already in the CREATE TABLE statement then a new column isn't created, but values are inserted into the existing column.

fgregg commented 6 years ago

thanks so it sounds like we need to do "MIN(block_key) as block_key?

jbeales commented 6 years ago

Yes. I was working on a PR but ran into another bug in the same query - the key used for the join is not the same collation in each table, and MySQL doesn’t seem to like that, so I think that either the collation will have to be removed from the Create statement for the raw_data table or one will have to be added to the Create statements for the rest of the tables. On Wed, May 2, 2018 at 9:21 PM Forest Gregg notifications@github.com wrote:

thanks so it sounds like we need to do "MIN(block_key) as block_key?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dedupeio/dedupe-examples/issues/83#issuecomment-386168141, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGZ5xafsSrtChOPc0sMCtQHTydJ8cZoks5tuluBgaJpZM4TwWih .

fgregg commented 6 years ago

Boy MySQL has changed a lot in the last 5 or so years.

tfeathers commented 6 years ago

I had the same issue in mysql_example.py and was getting this message:

"No records have been blocked together. Is the data you are trying to match like the data you trained on?"

Just wanted to confirm that naming the field as described worked for me as well.

MIN(block_key) as block_key