Issue with Twin Contrastive Clustering Algorithm Implementation on Custom Dataset: Zero NMI and ARI, but 97% Accuracy

1amrutesh commented 1 year ago

I have implemented the Twin Contrastive Clustering Algorithm on my custom dataset using the instructions provided on the PyTorch website for ImageFolder (https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html). However, I am encountering a problem with my results.

The issue is that my Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI) are both zero, while the accuracy is reported as 97%. This is unexpected since NMI and ARI are standard evaluation metrics for clustering algorithms, and their values should not be zero if the clustering algorithm is working correctly.

I have considered various possible reasons for this issue, and I suspect that the folder structure of my dataset might be a potential cause. Specifically, the names of the images in class one and class two are different. While the images are still similar in nature, their filenames are not identical. This could lead to problems in the clustering algorithm's performance, as it relies on matching images based on their filenames. The directory structure is as follows images\ ├── class1\ │ ├── 00001.png │ ├── 00002.png │ ├── ... └── class2\ ├── 00001_br(1).png ├── 00002_br(1).png └── ...

To resolve this issue, I have attempted to rename the image filenames in the dataset to match each other in both classes. However, this did not improve the NMI and ARI metrics. I am now unsure of how to proceed with troubleshooting this problem.

I would appreciate any guidance or advice on what could be causing this issue and how to resolve it. Thank you.

Yunfan-Li commented 1 year ago

Hi, if my understanding is right, you need to place all images that belong to the same class under the same folder. It doesn't matter what the image filenames are, but which folders images are placed in. I suggest manually checking the labels given by ImageFolder to see if they are what you expected. Hope this answers your question.

1amrutesh commented 1 year ago

I would like to express my gratitude for your prompt response, and at the same time, apologize for my delayed reply. I am writing to inform you that I have resolved the issue I previously mentioned by modifying the dataset class. Consequently, I can now acquire all three metrics, and the train and boost scripts are functioning as expected.

However, I have one inquiry regarding the clustering process. Specifically, I am unsure about how to identify which image belongs to which cluster after the boosting stage. I would be immensely appreciative if you could provide me with the necessary information on this matter.

Thank you in advance for your assistance.

Yunfan-Li commented 1 year ago

The operation to get clustering assignments is the same before and after the boosting stage. Specifically, the assignments could be obtained by applying argmax on the cluster contrastive head output (whose dimension corresponds to the cluster number).

Yunfan-Li / Twin-Contrastive-Learning

Issue with Twin Contrastive Clustering Algorithm Implementation on Custom Dataset: Zero NMI and ARI, but 97% Accuracy #11