Closed colarusso closed 2 years ago
I made a quick and dirty tool. Different clustering algos may work better and it may be that a rules-based approach is best in the end, but you can find it here: https://github.com/SuffolkLITLab/form-explorer/blob/main/101%20Cluster%20Screens.ipynb
Given a list of variable names (mix of standard and nonstandard) return a list of lists where each internal list corresponds to a screen (maybe this should be a dictionary)...
My thinking is that we reCase() the variable names, vectorize them, and then cluster the results. See e.g., https://machinelearningmastery.com/clustering-algorithms-with-python/