The script takes different distance metric as input[editdistance,jaccards,cosine] and computes distance matrix which is then passed to Affinity Propogation for clustering. The script is also generic enough to add new distance metric to pass to affinity propogation. If a directory is provided as input it uses metadata from tika-parser to cluster the files within directory.
Example usage :
To cluster files in the input directory
python affinity_propagation.py --inputDir test --distance editdistance --uniqueId resourceName
Example D3 visualisation using edit-distance :
To cluster json objs in the JSON input file.
python affinity_propagation.py --inputFile [input json file] --distance [editdistance,jaccards,cosine] --config [config file with attribute:datatype] --jsonKey [key of json to read data from] --uniqueId [unique id in the dataset]
The script takes different distance metric as input[editdistance,jaccards,cosine] and computes distance matrix which is then passed to Affinity Propogation for clustering. The script is also generic enough to add new distance metric to pass to affinity propogation. If a directory is provided as input it uses metadata from tika-parser to cluster the files within directory.
Example usage :
Example D3 visualisation using edit-distance :
Example D3 visualisation using edit-distance :