Error: Invalid loss, terminating training

HamadYA / GhostFaceNets

This repository contains the official implementation of GhostFaceNets, State-Of-The-Art lightweight face recognition models.

MIT License

194 stars 38 forks source link

I have been attempting to fine-tune the model GN_W1.3_S1_ArcFace_epoch46.h5 on my custom dataset. I followed the steps to download, extract, and convert the faces_emore dataset as instructed, and then I added 30 new people with 5000 images to the dataset. Therefore, the dataset is essentially the same as the original one.

I set the batch size to 200 and used GN_W1.3_S1_ArcFace_epoch46.h5 as the starting point model. Initially, the accuracy starts from zero that is strange, because when we use a pretrained model on the same dataset, we would expect the accuracy to start form a sensible value (at least over 0.9), and I guess something is wrong at this stage.

Strangely enough, the accuracy starts to grow after some iterations and everything seemed to be progressing well during the first epoch, with the accuracy steadily increasing over time and reaching 0.8.

However, when the first epoch completed and the second epoch began, I encountered the following error message: Error: Invalid loss, terminating training.

What could be the cause of this error?

I have been attempting to fine-tune the model GN_W1.3_S1_ArcFace_epoch46.h5 on my custom dataset. I followed the steps to download, extract, and convert the faces_emore dataset as instructed, and then I added 30 new people with 5000 images to the dataset. Therefore, the dataset is essentially the same as the original one.

I set the batch size to 200 and used GN_W1.3_S1_ArcFace_epoch46.h5 as the starting point model. Initially, the accuracy starts from zero that is strange, because when we use a pretrained model on the same dataset, we would expect the accuracy to start form a sensible value (at least over 0.9), and I guess something is wrong at this stage.

Strangely enough, the accuracy starts to grow after some iterations and everything seemed to be progressing well during the first epoch, with the accuracy steadily increasing over time and reaching 0.8.

However, when the first epoch completed and the second epoch began, I encountered the following error message: Error: Invalid loss, terminating training.

What could be the cause of this error?

Hi,

This termination event is triggered from the script "myCallbacks.py" at line 30. To address this issue, I suggest exploring alternative optimization strategies, such as utilizing AdamW or SGDW instead of L2 regularizer. Additionally, adjusting the learning rate to a smaller value and employing a scheduler with parameters such as {"loss": losses.ArcfaceLoss(scale=16), "epoch": 1, "optimizer": optimizer} as the first scheduler may offer potential solutions.

HamadYA / GhostFaceNets

Error: Invalid loss, terminating training #43