data61 / anonlink

Python implementation of anonymous linkage using cryptographic linkage keys
Apache License 2.0
62 stars 8 forks source link

ValueError: Buffer dtype mismatch when running anonlink.candidate_generation.find_candidate_pairs on AWS Glue #595

Open bllmo opened 1 year ago

bllmo commented 1 year ago

When running the following code snippet on AWS Glue:

results_candidate_pairs = anonlink.candidate_generation.find_candidate_pairs(
    [
     ...
    ],
    [
    ...
    ]
    anonlink.similarities.dice_coefficient_accelerated,
    0.9,
)

I encounter the following error:

ValueError: Buffer dtype mismatch, expected 'const char' but got 'signed char'

I tried using anonlink.similarities.dice_coefficient_accelerated_python as an alternative, and it did not produce the error. However, this alternative is significantly slower, making it impractical for large datasets.

snazzer commented 5 months ago

I ran into a similar problem in https://github.com/data61/anonlink/issues/566, I put a PR there that hopefully fixes it. However, I'm not clear what the contribution guidelines are, so not sure how to move it forward.