FrederikFritsch / Image-clustering-project

MIT License
3 stars 0 forks source link

MIT licensed

Software for Image Clustering

Authors: Frederik Fritsch, Lu Chen, Vladislav Bertilsson

⚙️ Requirements:

Step-by-step guide:

  1. Choose method for feature extraction (Traditional/DNN)
  2. Choose clustering method (K-Means/DBSCAN/HDBSCAN)
  3. Evaluate

We have provided scripts to run all modules on Alvis, but the python files can of be run on any computer.

🚀 Modules

This repo is structured using modules that can be executed independently. Each module stores the results in the /Results folder after execution.

Modules

Feature Extraction

Generate a dataframe and save it as a .csv with the following options:

Command to Run The File
python3 TraditionalFeatureExtraction.py $DATA_PATH $CSV_FOLDER_NAME $WIDTH $HEIGHT $RESIZE_METHOD $COLORFEATURES $ROICOLORFEATURES $EDGEFEATURES $LBPFEATURES $ORBFEATURES

Where the inputs are the following:

Example on how to run this file:
python3 TraditionalFeatureExtraction.py Image_Data/ Test12 640 350 Lanczos 1 0 0 1 0

This will extract features from images inside the folder Image_Data/ and put the results in the folder Test12. The images will be resized to 640x350 using Lanczos algorithm and Color features + LBP features will be extracted.


Neural Network Feature Extraction

This module extracts feature from images with pre-trained neural networks

The following models are available:

Command to Run The File
python3 NeuralNetworkFeatureExtraction.py $DATA_PATH $CSV_FOLDER_NAME $MODEL_TYPE

Where the inputs are the following:

Example on how to run this file: python3 NeuralNetworkFeatureExtraction.py /Image_Data/ DNNTest VGG16


Clustering

K-Means Clustering

This module performs clustering through K-Means. Command to Run The File
python3 KMeansClustering.py $DATA_FILE_PATH $RESULTS_PATH $NORMALIZATION_METHOD $PCA_VARIANCE $MIN_K $MAX_K

Where the inputs are the following:

Example on how to run this file: python3 KMeansClustering.py Test12/Test12.csv Test12 Normalize 0.8 10 20

DBSCAN Clustering

This module performs clustering through DBSCAN. Command to Run The File
python3 DBSCANCluster.py $DATA_FILE_PATH $RESULTS_PATH $NORMALIZATION_METHOD $PCA_VARIANCE $min_epsilon $max_epsilon $min_samples $max_samples

Where the inputs are the following:

Example on how to run this file: python3 DBSCANCluster.py Test12/Test12.csv Test12 Standardizing 0.9 8 12 3 10

HDBSCAN clustering

This module performs clustering through HDBSCAN.

Command to Run The File
python3 HDBSCANCluster.py $DATA_FILE_PATH $RESULTS_PATH $NORMALIZATION_METHOD $PCA_VARIANCE $min_cluster_size $max_cluster_size

Where the inputs are the following:

Example on how to run this file: python3 HDBSCANCluster.py Test12/Test12.csv Test12 Standardizing 0.9 3 10


Evaluation and Plotting Module

This module plots merged images for inspection evaluation and more. Needs the output of a clustering algorithm as an input. Last argument decides how many images of each cluster is plotted.

Example on how to run this file: python3 Evaluation.py Test12/KMeansResults.csv Test12/ 15


📝License

Licensed under the MIT license.