Parallel DBScan, loosely coupled, algorithm using the disjoint-set data structure. From Patwary, M. M. A., et al. The MAKEFILE has been fixed. The compile time errors have been fixed. The run time error; incorrectly numbering clusters; has been fixed.
Link to paper: http://users.ece.northwestern.edu/~choudhar/Publications/ANewScalableParallelDBSCANAlgorithmUsingDisjointSetDataStructure.pdf To download the original, buggy code: http://cucis.ece.northwestern.edu/projects/Clustering/download_code_dbscan.html

A Disjoint-Set Data Structure based Parallel DBSCAN clustering implementation (MPI version)

How to run the tool:

  1. Compile the source files using the following command


  2. Run using following command

    mpiexec -n number_of_process ./mpi_dbscan -i filename -b -m minpts -e epsilon -o output[optional]


    mpiexec -n 8 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25 -o clus50k_clusters.nc

    run the following to get detail description on the program arguments

    ./mpi_dbscan ?

  3. Input file format:

    binary file: number of points, N and number of dimensions, D (each 4 bytes) followed by the points coordinates (N x D floating point numbers).

  4. Output file format (Optional, one can get the statistics about the clustering solution without writing the clusters to file):

    netCDF file: The coodinates are named as columns (position_col_X1, position_col_X2, ...) and then one additional column named cluster_id for the corresponding cluster id the point belong to.