NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.
MIT License
9.15k stars 1.42k forks source link

Huge negative loss while fine-tuning SAM #409

Closed martintomov closed 5 months ago

martintomov commented 5 months ago

Hey @NielsRogge, I need your help with the fine-tune SAM tutorial notebook you provided on GitHub. I successfully replicated your entire notebook using the nielsr/breast-cancer dataset, and it works great. However, when I attempt to use it with my own dataset, I encounter that while training I get huge negative mean loss that I've been unable to resolve all day. Could the problem be related to how my dataset is divided? My images are RGB, my masks are Grayscale.

I followed the guide to upload the dataset to the Hub, but I suspect there might be an issue with it. The dataset comprises 733 images and 733 segmentation masks. If possible, could you please take a look and help me troubleshoot this issue?

Loss while training:

100%|██████████| 367/367 [02:07<00:00,  2.87it/s]
EPOCH: 0
Mean loss: -869916.3045110469
100%|██████████| 367/367 [02:05<00:00,  2.92it/s]
EPOCH: 1
Mean loss: -3227902.420640327
100%|██████████| 367/367 [02:05<00:00,  2.92it/s]
EPOCH: 2
Mean loss: -7128018.555177112
100%|██████████| 367/367 [02:06<00:00,  2.90it/s]
EPOCH: 3
Mean loss: -13922864.103542235
100%|██████████| 367/367 [02:07<00:00,  2.88it/s]
EPOCH: 4
Mean loss: -25008151.683923706
100%|██████████| 367/367 [02:07<00:00,  2.88it/s]
EPOCH: 5
Mean loss: -41779737.38964578
100%|██████████| 367/367 [02:06<00:00,  2.91it/s]
EPOCH: 6
Mean loss: -65599452.35967302
100%|██████████| 367/367 [02:06<00:00,  2.91it/s]
EPOCH: 7
Mean loss: -97740789.5040872
100%|██████████| 367/367 [02:05<00:00,  2.92it/s]
EPOCH: 8
Mean loss: -139604027.53133515
100%|██████████| 367/367 [02:06<00:00,  2.91it/s]
EPOCH: 9
Mean loss: -192096536.71934605
100%|██████████| 367/367 [02:05<00:00,  2.92it/s]
EPOCH: 10
Mean loss: -256580268.73024523

NielsRogge commented 5 months ago

Hi @martintmv-git thanks for your interest in my notebook. Looking at your dataset, it looks like the labels might need to be reversed, i.e. the object of interest needs to be marked with white and the background with black pixels.

martintomov commented 5 months ago

Even after attempting to invert the colour of the object of interest, the issue with training persisted. Upon further investigation, it appears that it also came from the format and encoding of the masks (labels).

All labels must be in format:

TIFF image data, little-endian, direntries=10, height=256, bps=32, compression=none, PhotometricIntepretation=BlackIsZero, width=256