Training time - Githubissues

wuw2135 commented 8 months ago

Hi, thank you for your response last time. However, I have a new question. I was training a model on a PC equipped with an RTX 4080, but it is taking an excessive amount of time, almost half a year, which seems abnormal. Are you conducting the training in a distributed manner?

AbnerVictor commented 8 months ago

It shouldn’t take that long to train SAMM, I use 3090 (single GPU) for training with a small batch size=2. With that said, the default training iteration is redundant (which may take forever to finish), you may stop much earlier when the training loss is converged, usually it only takes hours. It’s okay if your inference time for each iteration is reasonable, if not, it might related to some environment issues. Also, the cycle_align parameter better be lower than 3 (1 or 2 is fine), otherwise it may also cost extra training time. Please let me know if you have any questions.

wuw2135 commented 8 months ago

Maybe I need to check some details with you, as I encountered errors during my training attempts.

FFHQ Dataset Image Count I placed 5,000 images in the location you specified.

KeyError: 'num_style_feat' The 'E4E_Face.yml' file does not specify the 'num_style_feat' parameter, leading to an error. I set it to 512, but I'm not sure if this is correct.

File "...\venv\Lib\site-packages\basicsr\models\stylegan2_model.py", line 35, in __init__
self.num_style_feat = opt['network_g']['num_style_feat']
                      ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'num_style_feat'

cannot access local variable 'idx' where it is not associated with a value The code in "src\models\OOD_faceGAN_model.py", line 919, produced this error during training.

AbnerVictor commented 8 months ago

Maybe I need to check some details with you, as I encountered errors during my training attempts.

FFHQ Dataset Image Count I placed 5,000 images in the location you specified.

KeyError: 'num_style_feat' The 'E4E_Face.yml' file does not specify the 'num_style_feat' parameter, leading to an error. I set it to 512, but I'm not sure if this is correct.
File "...\venv\Lib\site-packages\basicsr\models\stylegan2_model.py", line 35, in __init__
    self.num_style_feat = opt['network_g']['num_style_feat']
                          ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'num_style_feat'
cannot access local variable 'idx' where it is not associated with a value The code in "src\models\OOD_faceGAN_model.py", line 919, produced this error during training.

Q1: It should be fine. Does it load properly?

Q2: 'num_style_feat' should be 18 by default for FFHQ pretrained styleGAN2 model. While in your error, I notice that you may install basicsr via pip. Please try to install the basicsr from https://github.com/AbnerVictor/OOD-GAN-inversion/tree/main/BasicSR. A simple way to do so is to

cd OOD-GAN-inversion/BasicSR
pip install -e ./

Please mind that such an operation would replace your previously installed basicsr.

Q3: Thank you for reporting the issue, I fix the problem in a new submission.

wuw2135 commented 8 months ago

  File "D:\OOD-GAN-inversion-main\src\models\OOD_faceGAN_model.py", line 923, in nondist_validation
    self.metric_results[metric] /= val_cnt
ZeroDivisionError: division by zero

It seems like it didn't go through the for loop. Did I miss something?

AbnerVictor commented 8 months ago

  File "D:\OOD-GAN-inversion-main\src\models\OOD_faceGAN_model.py", line 923, in nondist_validation
    self.metric_results[metric] /= val_cnt
ZeroDivisionError: division by zero

It seems like it didn't go through the for loop. Did I miss something?

Hi, it seems that there is something wrong with the training config that no metrics are calculated, can you tell me which config you’re using?

wuw2135 commented 7 months ago

I was using the 'E4E_face' just as you provided. I didn't change anything else.

AbnerVictor commented 7 months ago

I was using the 'E4E_face' just as you provided. I didn't change anything else.

I checked the code again and I suppose that it might be an issue related to the validation dataset.

Where in the 'E4E_face.yml', line 39, do you put in a path of a validation dataset? It could be a folder which contains face images from datasets like celebAHQ or FFHQ, few images is fine.

wuw2135 commented 7 months ago

It seems great now, thank you🙏. I'm still in training, and I don't know when to stop. Do you have any experience with this?🤔

AbnerVictor commented 7 months ago

It seems great now, thank you🙏. I'm still in training, and I don't know when to stop. Do you have any experience with this?🤔

Sounds great. Maybe you can check the visualization or the loss curve (via tensorboard) to decide.

wuw2135 commented 7 months ago

Thx, it was great till now, but I still have some question. In the paper, you use optical flow to predict the ID regions($g_{i}'$), is there can get any chance to get this feature in every image individually?

AbnerVictor commented 7 months ago

Thx, it was great till now, but I still have some question. In the paper, you use optical flow to predict the ID regions(gi′), is there can get any chance to get this feature in every image individually?

Surely, the flow feature is in the SAMM module, specifically on line 168 of this file:

src/ops/SAMM/helpers.py

AbnerVictor / OOD-GAN-inversion

Training time #5