czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.27k stars 140 forks source link

Issues with Semantic Segmentation on custom datasets #175

Open HarshitSheoran opened 5 months ago

HarshitSheoran commented 5 months ago

The way I do it in EVA or MMSegmentation is that I do 4 classes (1 background + 3), reduce_zero_label = False and ignore_index=0 in every loss function (CE and Dice in this case), I do those steps and and my training works in other libraries, is this method generally wrong as I can not train properly in ViT-Adapter with this? I am not seeing any error just extremely bad predictions at very easy task

I also have confusion with how stuff + things work, when I do 1 stuff and 3 things the num_classes goes to 5 instead of 4?

@czczup Please help

lgwplay commented 5 months ago

Have you solved this problem, I have a similar problem when I do semantic segmentation, there is no problem, it can be trained correctly on deit, but on beit+mask2former, the prediction in each category is close to 0

HarshitSheoran commented 5 months ago

Nah, I never figured it out