Open universewill opened 2 years ago
have you get the train code?
You need to train using the original guided-diffusion code base.
To get the same commit as we used do the following:
git clone https://github.com/openai/guided-diffusion.git
cd guided-diffusion
git checkout 912d577
For training we use 256 × 256 crops in three batches on 4×V100 GPUs each.
Does that work for you?
@universewill Can you use your own dataset to train the model?
Could you provide the parameter about training? For example,imagenet256's training parameter?
I did try to reproduce the training details on CelebaHQ without success. I updated the guided-diffusion params using the face_example.yml file. Are those the params used to train? The loss stay at NaN for multiple steps, like a lot. But if I put the default conf, only for some step and it get better. What kind of behavior is expected?
我确实尝试在 CelebaHQ 上重现培训细节,但没有成功。我使用 face_example.yml 文件更新了 guided-diffusion 参数。这些是用于训练的参数吗?损失在 NaN 停留了多个步骤,就像很多一样。但是,如果我放置默认的conf,只需执行一些步骤,它就会变得更好。预期会出现什么样的行为?
@jbnjvc10000 did you tried? when I set num_channels
to 256, i often have NaN in the loss which skip the training step. And it does not go away
@jbnjvc10000 did you tried? when I set
num_channels
to 256, i often have NaN in the loss which skip the training step. And it does not go away
@MaugrimEP Sorry I didn't meet your problem. Maybe you should make sure your dataset and parameters are the same as the paper provides?
@jbnjvc10000 I solved my issue. It was system-related. I tried the same code under Linux and Windows. Windows produces NaN and Linux was perfect. :> classic
@MaugrimEP
By the way, it may be a file could not be find because windows had a different path
@jbnjvc10000 did you tried? when I set
num_channels
to 256, i often have NaN in the loss which skip the training step. And it does not go away
@MaugrimEP Hello, what was the type of GPU you used to train?
I used a single 12GB GeForce RTX4070 GPU to train model using guided-diffusion, and I trained my model based on my own dataset consisting of 120000 high quality human facial images (each image is 256*256 pixels and RGB), however, if I set num_channels
to 256, whatever batch_size I set, when I run the training command, it gave me error message: RuntimeError: CUDA out of memory. I think it meant my GPU memory capacity isn't enough, so I am curious about what the type of GPU you used to train was.
I will be grateful if you reply to me !
How to train or finetune on custom dataset?