Open wuw2135 opened 8 months ago
It shouldn’t take that long to train SAMM, I use 3090 (single GPU) for training with a small batch size=2. With that said, the default training iteration is redundant (which may take forever to finish), you may stop much earlier when the training loss is converged, usually it only takes hours. It’s okay if your inference time for each iteration is reasonable, if not, it might related to some environment issues. Also, the cycle_align parameter better be lower than 3 (1 or 2 is fine), otherwise it may also cost extra training time. Please let me know if you have any questions.
Maybe I need to check some details with you, as I encountered errors during my training attempts.
File "...\venv\Lib\site-packages\basicsr\models\stylegan2_model.py", line 35, in __init__
self.num_style_feat = opt['network_g']['num_style_feat']
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'num_style_feat'
Maybe I need to check some details with you, as I encountered errors during my training attempts.
- FFHQ Dataset Image Count I placed 5,000 images in the location you specified.
- KeyError: 'num_style_feat' The 'E4E_Face.yml' file does not specify the 'num_style_feat' parameter, leading to an error. I set it to 512, but I'm not sure if this is correct.
File "...\venv\Lib\site-packages\basicsr\models\stylegan2_model.py", line 35, in __init__ self.num_style_feat = opt['network_g']['num_style_feat'] ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^ KeyError: 'num_style_feat'
- cannot access local variable 'idx' where it is not associated with a value The code in "src\models\OOD_faceGAN_model.py", line 919, produced this error during training.
Q1: It should be fine. Does it load properly?
Q2: 'num_style_feat' should be 18 by default for FFHQ pretrained styleGAN2 model. While in your error, I notice that you may install basicsr via pip. Please try to install the basicsr from https://github.com/AbnerVictor/OOD-GAN-inversion/tree/main/BasicSR. A simple way to do so is to
cd OOD-GAN-inversion/BasicSR
pip install -e ./
Please mind that such an operation would replace your previously installed basicsr.
Q3: Thank you for reporting the issue, I fix the problem in a new submission.
File "D:\OOD-GAN-inversion-main\src\models\OOD_faceGAN_model.py", line 923, in nondist_validation
self.metric_results[metric] /= val_cnt
ZeroDivisionError: division by zero
It seems like it didn't go through the for loop. Did I miss something?
File "D:\OOD-GAN-inversion-main\src\models\OOD_faceGAN_model.py", line 923, in nondist_validation self.metric_results[metric] /= val_cnt ZeroDivisionError: division by zero
It seems like it didn't go through the for loop. Did I miss something?
Hi, it seems that there is something wrong with the training config that no metrics are calculated, can you tell me which config you’re using?
I was using the 'E4E_face' just as you provided. I didn't change anything else.
I was using the 'E4E_face' just as you provided. I didn't change anything else.
I checked the code again and I suppose that it might be an issue related to the validation dataset.
Where in the 'E4E_face.yml', line 39, do you put in a path of a validation dataset? It could be a folder which contains face images from datasets like celebAHQ or FFHQ, few images is fine.
It seems great now, thank you🙏. I'm still in training, and I don't know when to stop. Do you have any experience with this?🤔
It seems great now, thank you🙏. I'm still in training, and I don't know when to stop. Do you have any experience with this?🤔
Sounds great. Maybe you can check the visualization or the loss curve (via tensorboard) to decide.
Thx, it was great till now, but I still have some question. In the paper, you use optical flow to predict the ID regions($g_{i}'$), is there can get any chance to get this feature in every image individually?
Thx, it was great till now, but I still have some question. In the paper, you use optical flow to predict the ID regions(gi′), is there can get any chance to get this feature in every image individually?
Surely, the flow feature is in the SAMM module, specifically on line 168 of this file:
Hi, thank you for your response last time. However, I have a new question. I was training a model on a PC equipped with an RTX 4080, but it is taking an excessive amount of time, almost half a year, which seems abnormal. Are you conducting the training in a distributed manner?