Open davebobobo opened 2 years ago
Same problem. Have you solved it?
This definitley is not a "real" solution, but i had the same thing with stylegan3-t or stylegan3-r as config, so i just went back to the stylegan2 config
I'm also having the same issue when setting gpus=2. When set to 1 GPU I can then produce my first fake image from tick 0, however, it then gets stuck on calculating metrics. Setting metrics=none doesn't fix this, just stops at fake00000. Were you able to get yours working yet?
I only got it working by installing Ubuntu.
Just noticed it isn't stuck on metrics just takes a very very long time with a single GPU -.-
I'm training the model in colab and I'm having problems too. The training gets stucked after tick 0
.
@domef Hey, did you figure out where the card was? It seems that this model training requires a lot of resources
@xiaomao19970819 I didn't train anymore but probably it was just very slow (I was training on colab).
I'm having the same issue with a single 3070 gpu (8gb). The interesting part is, it did work for 120k iterations, but I had to shut down my pc and now I'm getting this error while trying to resume the latest snapshot. I did change the snap from 5 to 20 (because the metrics report takes like 30 minutes and it's happening way too often) and I also changed the number of workers from the default value (I think 2?) to 8, since I have a 12 core processor.
At the point where it says "Training for 25000 kimg...", my ram fills up to the max (64gb) and my pc becomes very unresponsive. I've let this run for over 30 minutes without anything happening. Also tried different batch sizes, that didn't change anything.
Update: Resuming with my original settings (batch=4 and workers=default) resulted in no issues. My ram is almost full though at~60/64gb.
Hii @felkoh can you say how much time it got struck? or if any alternative solution.
Thank you
Hey guys,
complete beginner here. Thanks to some tutorials I made it this far: I created a dataset (200 pictures) and try to train a new network. I am then stuck without ever getting to Tick 0. fakes_init.png is created properly. I am running on 2x 3080 10 GB (which won't get any load)
(environment) PS C:\Users\david\stylegan3> python train.py --outdir E:\LL\Training --data E:\LL\Dest\Dest.zip --cfg=stylegan3-t --gpus=2 --batch=32 --gamma=8 --batch-gpu=8 --snap=20 --kimg=1000
What I tried Changing the batch size. Didn't help.
Desktop (please complete the following information): OS:Windows 10 PyTorch pytorch: 1.9.1+cu111 CUDA toolkit version CUDA 11.1 NVIDIA driver version 472.47 GPU 2x 3080 RTX (10 GB) Docker: did you use Docker? No
Any hints on what I could try? Thanks
EDIT After 30 minutes it gave me this error message