Alibaba-MIIL / TResNet

Official Pytorch Implementation of "TResNet: High-Performance GPU-Dedicated Architecture" (WACV 2021)
Apache License 2.0
471 stars 63 forks source link

falling into loss "NAN" #5

Closed YuNie24 closed 4 years ago

YuNie24 commented 4 years ago

I just train my own model and train dataset via your TResNet feature extractor.

BTW, my loss is fall into NaN after hundreds of iteration processed. If I lowering learning rate, loss fall into NAN later. (Until fall into Nan, my loss look converging well)

can I ask what's your optimization function or initial learning rate(and scheduler)? Or do u have any idea about NaN loss??

mrT23 commented 4 years ago

@ykim1024 your description is very general and lacks details what dataset ? what is the loss function ? mixed precision or fp32 ? what is the lr ?

we trained tenth of datasets with TResNet on mixed precision, and it was highly stable. my strong guess is that its not an architecture issue. try replacing TResNet with resnet50. do you get no-NaN loss ?

YuNie24 commented 4 years ago

Yes, as you mentioned, I also agree that it is not the problem of TResNet architecture. but, when I use feature extractor with General ResNet50, there was any problem.

I used nn.CrossEntropyLoss() for my OCR dataset. My purpose was just extract feature of image well, I only used statement "x = self.body(x)" and not used last embedding(nn.linear) layer. My entire model was ensembled with feature extractor and RNN(LSTM or GRU).

Anyway, I thought several function of TResNet layer can be apply to my model to upgrate performance.

mrT23 commented 4 years ago

"with General ResNet50, there was any problem" ? so there is a problem with resnet50, or resnet50 works ok ?

anyway, some general tips:

  1. make sure you do the the correct normalization, i.e. just divide the input by 255. TResNet does not use imagenet statistics normalization

  2. if you use mixed precision, try without

  3. make sure you freeze the backbone correctly:

for name, child in model.named_children():
    for name2, params in child.named_parameters():
        params.requires_grad = False
model.eval()

all the best

YuNie24 commented 4 years ago

i meant it was no problem when i was Resnet50. Ok, I'll try some tips :) thanks