Training Loss Increasing After Initial Epochs - Need Guidance

MathieuNlp / Sam_LoRA

Segment Your Ring (SYR) - Segment Anything model adapted with LoRA to segment rings.

MIT License

71 stars 12 forks source link

Training Loss Increasing After Initial Epochs - Need Guidance #6

Open Berlin000000 opened 4 months ago

Berlin000000 commented 4 months ago

Hello, thank you very much for sharing!

I successfully ran the code and fine-tuned it on my own dataset. However, during the training process, I noticed that the loss was decreasing only during the first two epochs, but then it started to increase over the subsequent epochs. What should I do?

MathieuNlp commented 4 months ago

Hi,

For how many epochs did you train and how big is your dataset ?

Berlin000000 commented 4 months ago

Thank you very much for your response! I would be very grateful if I could get some insights from your experience.

Here is the issue I'm encountering: I have a training dataset of approximately 110,000 image-mask pairs, and I'm training with a BATCH_SIZE of 2 for 7 epochs. During the first two epochs, the loss decreased, and the models obtained after these two epochs performed better on the test set compared to the baseline in terms of loss. However, in the subsequent epochs, the loss started to increase, and the model obtained after the last epoch (epoch 6) performed worse on the test set than the baseline. My learning rate is set to LEARNING_RATE=0.0001.

Have you experienced similar issues when training the models? Could you provide any suggestions to help improve the training performance of my model? Thank you very much!

MathieuNlp commented 4 months ago

If we suppose that the code is working, it might come from the batch size. If the batch size is low (from example 2) you will compute the loss over 2 images and then learn from that. Hence,if you have a large dataset (110K) with different types of images and a low batch size, you might need a lot of time to learn (more than 7 epochs) I think. Maybe you can try to launch a train for 50 epochs and see how it works. Or you can increase the batch size, maybe reduce the resolution of images to save memory. Also you can try to increase the learning rate to avoid training for too long for the tests. This is what I think.

Berlin000000 commented 4 months ago

Thank you very much for your suggestions and help. I will try the solution you provided and discuss it with you afterwards.

luobendewugong commented 4 months ago

Hello, thank you very much for sharing!

I successfully ran the code and fine-tuned it on my own dataset. However, during the training process, I noticed that the loss was decreasing only during the first two epochs, but then it started to increase over the subsequent epochs. What should I do?

Hello, I see from your description that you have implemented setting the BATCH_SIZE to 2. How did you achieve this? I would like to compare different implementation methods. The original code will report an error when setting BATCH_SIZE to 2. Could I consult and communicate with you?

Nomination-NRB commented 4 months ago

您好，非常感谢您的分享！我成功地运行了代码，并在我自己的数据集上对其进行了微调。然而，在训练过程中，我注意到损失仅在前两个时期减少，但随后在随后的时期开始增加。我该怎么办？

您好，我从您的描述中看到您已经实现了将BATCH_SIZE设置为 2。你是如何做到这一点的？我想比较不同的实现方法。将 BATCH_SIZE 设置为 2 时，原始代码将报告错误。我可以咨询和沟通吗？

需要将图片统一transform到一样的size