Yunfan-Li / Contrastive-Clustering

Code for the paper "Contrastive Clustering" (AAAI 2021)
MIT License
289 stars 92 forks source link

About the application in other fields #30

Closed 4fee8fea closed 2 years ago

4fee8fea commented 2 years ago

Dear Yunfan,

Thanks for your outstanding work! I have learned a lot from your paper, codes, and GitHub Issues.

In my research area, there are studies based on your work. However, according to my replication results, the clustering accuracy curve kept oscillating between high and very low values during the training process. I attribute this to the changes they made to your original loss function.

After being disappointed, I applied for your work directly to my dataset and was able to get good results. I was very surprised and grateful.

However, on my dataset, the loss shows a decreasing trend as the training proceeds, but the accuracy curve keeps decreasing from the very high values at the beginning.

My data is characterized by small image size, probably around 11x11, and a high number of channels. So I simply design a three-layer CNN network.

The accuracy curves show the following pattern,

ACC

Can you help me to see what is the reason for this?

Thanks again.

Best wishes.

Yunfan-Li commented 2 years ago

Could you provide the value for the instance-level and cluster-level loss w.r.t. the training process separately? I wonder if they are both stably decreasing.

4fee8fea commented 2 years ago

Thanks for the reply!

The following charts show the results of my latest run.

run

The loss function is loss = loss_instance + loss_cluster. And the process of calculating the loss function follows the implementation in this repository.

Parts of the hyperparameter settings are as follows: key value
instance_temperature 0.5
cluster_temperature 1.0
learning rate 3e-4
weight_decay 0.
batch_size 512
4fee8fea commented 2 years ago

I added the Cosine Annealing Scheduler by scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=args.epochs).

I think the gradually decreasing learning rate is the key reason for the convergence of accuracy.

Yunfan-Li commented 2 years ago

Thanks for your prompt reply. Honestly I am also confused for the oscillatory ACC since both losses decreases stably. However, I notice that the NMI metric steadily increases during the training. Is your custom dataset class-balanced? The ACC metric might oscillate when the dataset is imbalanced. Also, did you evaluate the model with net.eval() to disable layers such as Batch Normalization and Dropout?

4fee8fea commented 2 years ago

Thank you so much for your willingness to help!

The net.eval() option has been set.

As you said, my dataset indeed has a serious category imbalance problem.

Concretely, I am considering the pixel-by-pixel classification problem. In this scenario, patches centered on a certain pixel need to be generated by means of sliding window.

If the sliding window step size is too small, it will result in too large overlapping parts between neighbor patches, thus generating a large number of false-negative examples.

If the sliding window step size is too large, it may result in some classes of points being directly ignored.

Yunfan-Li commented 2 years ago

I see. If the dataset has a serious category imbalance problem, I recommend slightly lowering the weight (or making some modifications, if necessary) of the cluster entropy loss in cluster-level loss which is used to prevent trivial solution by encouraging the model to produce more cluster-balanced assignments.

4fee8fea commented 2 years ago

Many thanks!

The imbalanced dataset problem has been discussed in #25.

I simply remove the entropy term and get a more reasonable curve.

new

But one pattern in the variation is still slightly uncanny. The network can achieve the best results after the first one or two epochs, then rapidly drop to very low performance, and the subsequent epochs are in a normal state of slowly rising.

Yunfan-Li commented 2 years ago

You may try different random seeds to see if this case occurs every time. If so, it seems that the model converges into a local optimum. Perhaps a lower learning rate or a warm up schedule (especially on the cluster-level loss) would help.

4fee8fea commented 2 years ago

Thank you for so many valuable comments tonight! I've learned a lot.

I wish you all the best in your research and life!

Yunfan-Li commented 2 years ago

Thank you! The same to you~

4fee8fea commented 2 years ago

Dear Yunfan,

For unbalanced datasets, a contrastive learning framework using negative samples (e.g. SimCLR) may be more beneficial for classes with a small number of samples, however, classes with more samples will be harmed by too many false-negative samples.

I wonder that is it possible to improve the performance of contrastive clustering on unbalanced datasets by using a contrastive learning framework using only positive samples?

Yunfan-Li commented 2 years ago

Indeed. Non-negative contrastive learning such as BYOL has shown to performs much better than frameworks like SimCLR and MoCo. (https://openreview.net/forum?id=JZrETJlgyq)

4fee8fea commented 2 years ago

Thanks! I wanna try to replace the SimCLR framework with BYOL, hoping to achieve better clustering performance on my unbalanced dataset.

Besides, did you use Microsoft PowerPoint to draw the flow chart of your model? Very clean and beautiful!

Yunfan-Li commented 2 years ago

I used draw.io to draw the figures.