JeffBorwey / GraphClustering

GraphClustering is a C# GUI for data mining and clustering research. In particular, this implements the graph based resilience measure Vertex Attack Tolerance (VAT) and the adapted clustering algorithm Hierarchical VAT Clustering (hVATClust). The NetMining library provides many other common clustering algorithms (K-Means, SOM, Girvan-Newman, etc.), Several ADTs (Quadtree, Heap, DisjointSet), Dimensionality Reduction, Data Generation, and Internal(Dunn, Silhouette, Davies–Bouldin) and External Clustering evaluation.
MIT License
3 stars 5 forks source link

Sample Input #2

Open khizer-hayat opened 8 years ago

khizer-hayat commented 8 years ago

As there is no sample input data file within this project, so its hard to test the performance of any evaluation indices. Please provide the sample input data and also suggest in what forms the input data be supported here.

JeffBorwey commented 8 years ago

Sorry about that. The documentation on this project is lacking and I haven't been actively developing it for quite some time. I'll include example files in the project but quickly here: The cluster file format is defined as such:

[Points, DistanceMatrix, Graph] [File Name]
Clusters [number of clusters]
[size of cluster 0]
[list of item indices cluster 0]
[size of cluster 1]
[list of item indices cluster 1]
...
Meta [comments, algorithm statistics, etc.  These are ignored]

With an example

Points iris.txt
Clusters 3
50
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 
46
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 69 71 73 74 75 76 77 78 79 80 81 82 84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 106 
54
68 70 72 83 87 100 101 102 103 104 105 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
Meta Vat Recursive: 1 iterations

Where iris.txt is just a list the 150 data points from the iris flower dataset which are tab delimited.

5.1 3.5 1.4 0.2
4.9 3   1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5   3.6 1.4 0.2
5.4 3.9 1.7 0.4
...
khizer-hayat commented 8 years ago

Its all okay but actually there should be a sample input file attached with the project ,user just have to use this file and get the cluster evaluation results based on the index used. As there are so many classes, code files, data structures used in your code (I've read it), one may dismay straight forwardly that how to make ready an input to use these coded indices. Anyhow, I hope you would like to respond further at the issue # 01 which we have discussed recently.