Hyperparameter tuning / optimization for Levenshtein

mishugana commented 3 years ago

Ignore the deduper for now

mishugana commented 3 years ago

guesses = np.array([specialFuzz(*t) for t in tests]) pd.DataFrame(tests)[guesses != answers] specialFuzz("Windber Area Community Kitchen", "Windber Area Community Kitchen - Wackpack Program")

maxachis commented 3 years ago

@mishugana This has been chilling here for a while. If I merge this, it won't affect the generate_merged_datasets workflow, because these files aren't among the ones that are called in that workflow.

Since Optimizer is just a demo, my main question is with deduper.py. I've yet to get the postal module to run in the id_duplicates_function.py file -- installing it either on my Windows machine or via the Github workflows hasn't been successful for me so far. Do you think you'd be able to take a crack at seeing if you could get this to work with Github workflow?

CodeForPittsburgh / food-access-map-data

Hyperparameter tuning / optimization for Levenshtein #96