Issue with training big-lama with custom dataset (started from pretrained weights)

jmunshi1990 commented 1 year ago

I am successfully able to load checkpoints from the big-lama-with-discr, following resolution from previous issue post. I am able to successfully load the weights and it starts trainer with inpainting evaluation step. However during the FIDScore calculation it is failing with "[2023-10-13 15:43:11,238][main][CRITICAL] - Training failed due to Imaginary component 1.0299959193217622e+115"

When I check the sample dir in the outputs I see that the results generated using inpaitning are pretty good (since using pretrained model) but need some finetuning (the reason for wanting to fine tune). Can you please tell me why the imaginary compoenent should occur? Is it due to something related to model not able to predict or something else?

Here are some details:

python bin/train.py -cn pretrain-config +location=wli_dataset +trainer.kwargs.resume_from_checkpoint=$HOME_DIR/wli_ai_defect/inpaint-anything/lama/pretrained_model/big-lama-with-discr/models/best_stripped.ckpt

logs and configs are attached train.log

amangupta2303 commented 1 year ago

@jmunshi1990 I have also encountered the same error. I started the training by commenting out the lines which were causing this issue. I dont know if this is the best way to this or not but with this my training started successfully without any errors. I am sure that why the imaginary compoenent occurs but the error occurs during the FIDScore calculation and FIDScore is just a evaluation metric so i dont think that commenting out will effect model training or performance.

jmunshi1990 commented 1 year ago

@amangupta2303 thanks for your response

Did you see things improving once your training converged? FID score is an important metric to evaluate for Lama right?

amangupta2303 commented 1 year ago

@jmunshi1990, I am about to train the model using my custom dataset and will update you on its convergence. While the FID score is indeed a crucial metric for evaluating the model, I believe it won't impact the training process. Its primary purpose is to assess the model's performance after training is completed.

jmunshi1990 commented 1 year ago

@amangupta2303 Thakns for responding. I have trained (fine tuned the big lama) for my specific problem dataset, I see things improving visually but the FID score metric still shows that complex number issue which I am not sure how to get rid of. Anyways I am training and testing everything without FID score.

amangupta2303 commented 1 year ago

@jmunshi1990 Good to hear that. I am training my model from scratch without FID score on my custom dataset and here too i can see things are improving visually but FID score shows complex numbers. I am not sure too how to get rid of this but I will try to work on this, maybe we can solve this issue.

Soongja commented 1 year ago

@jmunshi1990 @amangupta2303

Hi, I encountered the same problem, and I think it is related to the issues below.

According to the issues, the sample size to calculate FID should be at least 2,048 or it might not work. And the authors of the FID paper recommends 10,000 or more samples.

In my case, I got the same error when my evaluation set had 1,000 samples, and increasing the size to 10,000 solved the error. What is the size of the evaluation set of your custom dataset?

amangupta2303 commented 1 year ago

@Soongja Thank you. Got your point.

senya-ashukha commented 1 year ago

Huge thanks!

advimman / lama

Issue with training big-lama with custom dataset (started from pretrained weights) #266