data61 / clkhash

CLK hash: hash pii for entity matching
Apache License 2.0
47 stars 9 forks source link

API Review #26

Open hardbyte opened 6 years ago

hardbyte commented 6 years ago

Consider if the right levels of abstraction have been made for a library user and document options to improve.

It should be relatively easy for a clkhash user to define custom schema, and easily use clkhash with their own data source (db, job queue etc). The output serialization should be swappable.

Stretch goal: easily provide user defined tokenization/comparison strategies.

To resolve this issue a write up of the API's deficiencies is required.

hardbyte commented 5 years ago

At a minimum the tutorials shouldn't involve custom serialization/marshaling.