Closed 4fee8fea closed 2 years ago
Could you provide the value for the instance-level and cluster-level loss w.r.t. the training process separately? I wonder if they are both stably decreasing.
Thanks for the reply!
The following charts show the results of my latest run.
The loss function is loss = loss_instance + loss_cluster
. And the process of calculating the loss function follows the implementation in this repository.
Parts of the hyperparameter settings are as follows: | key | value |
---|---|---|
instance_temperature |
0.5 |
|
cluster_temperature |
1.0 |
|
learning rate |
3e-4 |
|
weight_decay |
0. |
|
batch_size |
512 |
I added the Cosine Annealing Scheduler
by scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=args.epochs)
.
I think the gradually decreasing learning rate is the key reason for the convergence of accuracy.
Thanks for your prompt reply. Honestly I am also confused for the oscillatory ACC since both losses decreases stably. However, I notice that the NMI metric steadily increases during the training. Is your custom dataset class-balanced? The ACC metric might oscillate when the dataset is imbalanced. Also, did you evaluate the model with net.eval() to disable layers such as Batch Normalization and Dropout?
Thank you so much for your willingness to help!
The net.eval()
option has been set.
As you said, my dataset indeed has a serious category imbalance problem.
Concretely, I am considering the pixel-by-pixel classification problem. In this scenario, patches centered on a certain pixel need to be generated by means of sliding window
.
If the sliding window step size is too small, it will result in too large overlapping parts between neighbor patches, thus generating a large number of false-negative examples.
If the sliding window step size is too large, it may result in some classes of points being directly ignored.
I see. If the dataset has a serious category imbalance problem, I recommend slightly lowering the weight (or making some modifications, if necessary) of the cluster entropy loss in cluster-level loss which is used to prevent trivial solution by encouraging the model to produce more cluster-balanced assignments.
Many thanks!
The imbalanced dataset problem has been discussed in #25.
I simply remove the entropy term and get a more reasonable curve.
But one pattern in the variation is still slightly uncanny. The network can achieve the best results after the first one or two epochs, then rapidly drop to very low performance, and the subsequent epochs are in a normal state of slowly rising.
You may try different random seeds to see if this case occurs every time. If so, it seems that the model converges into a local optimum. Perhaps a lower learning rate or a warm up schedule (especially on the cluster-level loss) would help.
Thank you for so many valuable comments tonight! I've learned a lot.
I wish you all the best in your research and life!
Thank you! The same to you~
Dear Yunfan,
For unbalanced datasets, a contrastive learning framework using negative samples (e.g. SimCLR) may be more beneficial for classes with a small number of samples, however, classes with more samples will be harmed by too many false-negative samples.
I wonder that is it possible to improve the performance of contrastive clustering on unbalanced datasets by using a contrastive learning framework using only positive samples?
Indeed. Non-negative contrastive learning such as BYOL has shown to performs much better than frameworks like SimCLR and MoCo. (https://openreview.net/forum?id=JZrETJlgyq)
Thanks! I wanna try to replace the SimCLR framework with BYOL, hoping to achieve better clustering performance on my unbalanced dataset.
Besides, did you use Microsoft PowerPoint to draw the flow chart of your model? Very clean and beautiful!
I used draw.io to draw the figures.
Dear Yunfan,
Thanks for your outstanding work! I have learned a lot from your paper, codes, and GitHub Issues.
In my research area, there are studies based on your work. However, according to my replication results, the clustering accuracy curve kept oscillating between high and very low values during the training process. I attribute this to the changes they made to your original loss function.
After being disappointed, I applied for your work directly to my dataset and was able to get good results. I was very surprised and grateful.
However, on my dataset, the loss shows a decreasing trend as the training proceeds, but the accuracy curve keeps decreasing from the very high values at the beginning.
My data is characterized by small image size, probably around 11x11, and a high number of channels. So I simply design a three-layer CNN network.
The accuracy curves show the following pattern,
Can you help me to see what is the reason for this?
Thanks again.
Best wishes.