Closed toddysm closed 2 years ago
Steps needed:
NOTE: Damerau-Levenshtein might be more applicable to our situation. This will allow transpositions (swapping of adjacent symbols). Sentences like: "well done job"
and "job well done"
would be clustered closer together with Damerau-Levenshtein than they would be for the standard Levenshtein.
Example using Hierarchy Cluster.
Date | Activity | Time Spent |
---|---|---|
1/21/22 | Researching clustering | 1 hour |
1/22/22 | Researching clustering (cont...) | 2 hours |
1/24/22 | Researching converting string to clusterable data (Levenshtein, Jaro-Winkler, etc...) | 2 hours |
1/25/22 | Initial test to combine Levenshtein and Hierarchical clustering | 1 hour |
1/29/22 | Coded proof of concept script / created PR | 5 hours |
Estimated: 10-12 hours Actual: 11 hours
Description
Create an unsupervised clustering algorithm utilizing a python clustering library. This algorithm will help the team understand and use the data from Floop's database more efficiently.
Open questions
Resources
Time Estimate