Closed davrocks closed 5 years ago
I thought it's quite complicated to explain everything in one thread for a newbie. So let's kick them off one-by-one.
Once I download a set of high-quality jpg images of random people, do I have to downscale them manually in order to train with them?
The answer is YES|NO. In VSR we have a data loader. If you choose the default data parser (a parser is how to pre-process input data) then VSR will generate LR image inside the data loader (bicubically). However, if you choose another parser (e.g custom_pairs
), you need to prepare a self-made LR image.
To choose either method depends on your actual task:
LR
is acquired by Bicubic(HR, scale)
custom_pairs
If I put the images in this directory: then what would I need to change in datasets.yaml for it to be recognized by the program?
Just edit datasets.yaml
. I think it's easy to understand. Note there is a root
entry at the top of the file, and every path is relative to this root
.
is esrgan the best model for improving face quality or is there a better one? (e.g. should I be using a CNN, GAN, AE, or VAE type of model)
If you JUST want to improve face quality, you'd better retrain a model. The provided pre-trained weights of ESRGAN is for a general purpose. To choose a proper model, just remember the more parameters it has, the more accurate it generates, and the harder to train.
how many epochs should I run to train a dataset of about 5000-10000 photos?
Basically, you need 10k+ iterations. The default steps per epoch in VSR setting is 200, so I suggest to train more than 500 epochs for basic usage. In fact, I usually train 1000-2000 epochs for fine-tuning.
How are the training files saved, and how do I resume training from one of these files?
VSR automatically save checkpoints in --save_dir
, by default it's in ../Results/<model-name>/save
.
And VSR also resumes training from the --save_dir
.
For VSRTorch/train.py
, you can even assign --pth
to a specific weights file to resume.
Thanks for the extended reply! First, I downloaded the first 12,000 images of CelebA and split them up into folders for train, test, and val, each with "hr" and "lr" folders inside (lr is 4x downscaled from hr). Then, I created a dataset named PEOPLE
and tried to use the custom parser. However, there was an error when i ran python train.py esrgan --cuda --dataset PEOPLE --epochs 500 --pth /usr/local/VideoSuperResolution/Results/RRDB_GAN.pth
(see attached)
error.txt
I'm not sure if the missing weights error is normal, but i seem to get that error whenever I load the RRDB_GAN.pth on esrgan; however, there is a new error at the bottom that I just started getting when using the PEOPLE dataset:
Traceback (most recent call last):
File "train.py", line 124, in <module>
main()
File "train.py", line 118, in main
t.fit([tloader, vloader], train_config)
File "/usr/local/VideoSuperResolution/VSRTorch/Framework/Trainer.py", line 114, in fit
train_iter = v.train_loader.make_one_shot_iterator(mem, shuffle=True)
File "/usr/local/VideoSuperResolution/VSR/DataLoader/Loader.py", line 378, in make_one_shot_iterator
shuffle=shuffle)
File "/usr/local/VideoSuperResolution/VSR/DataLoader/Loader.py", line 201, in _generate_crop_grid
y = np.random.randint(0, _h - _ph + 1, size=amount)
File "mtrand.pyx", line 992, in mtrand.RandomState.randint
ValueError: Range cannot be empty (low >= high) unless no samples are taken
Here is a copy of my datasets.yaml:
Also, starting my training from RRDB_GAN.pth a good idea? or should I completely start over the training?
Thanks for helping
I think this is caused by some images in CelebA is smaller than 128x128. In esrgan.yml
the patch_size
is 128, it will randomly crop 128x128 patches from HR images and 32x32 from LR.
Import error is OK because provided RRDB_GAN.pth
doesn't contain weights for the discriminator. Hence fine-tuning from RRDB_GAN.pth is not a good idea. However, you can start from RRDB_PSNR.pth
, and you can choose whether to train a discriminator by setting its weights to zero or non-zero.
For example:
# PSNR-oriented training, no DNET
python train.py esrgan --cuda --weights=[0.01,1,0] --epochs=100 --dataset=people
# SRGAN-oriented training, with GAN weights 0.001
python train.py esrgan --cuda --weights=[0.01,1,0.001] --epochs=100 --dataset=people
This weights is overwriten to esrgan.yml
.
You could find weights at: https://drive.google.com/drive/u/0/folders/17VYV_SoZZesU6mbxz2dMAIccSSlqLecY or https://pan.baidu.com/s/1-Lh6ma-wXzfH8NqeBtPaFQ?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=#list/path=%2F
Thanks; so I should start with RRDB_PSNR.pth and go to the gan training with 0.001 for best results? (Or should I begin training with --weights=[0.01,1,0]
for PSNR, go for a couple hundred epochs, then run training again with --weights=[0.01,1,0.001]
)
And if i want to resume training from the saved file, then I dont have to put --weights=[0.01,1,0.001]
in again right?
In order to train continuously, the best effort is to edit YML file directly.
For who are new to deep learning, I suggest not to touch GAN. Its hyper-parameters highly depend on dataset, which requires master experience to tune it.
You are fine to start with ESRGAN and setting the last weight to zero.
"to start with ESRGAN" you mean starting from RRDB_PSNR.pth right?
Yes
I notice that the author provided RRDB_PSNRx4
and RRDB_PSNRx4_old_arch
, maybe you should try the old one.
Would PSNR training provide a good result though? or should I switch to another model that is easier to train
Actually that depends on what you want. You could first use RRDB_PSNRx4 to generate SR faces and check the quality.
In my view, PSNR-oriented model is fine enough to beat traditional methods. You could set GAN weights to a reasonably small value (i.e. 1e-4) to see what gonna to happen compared to RRDB_PSNRx4
.
Thanks, I will try the psnr first :)
Hello; I ended up going for the old_arch because it could load the rrdb parameters (and it started out with a lower image loss). Here are the logs: Old Arch.txt New Arch.txt
Now, I was just wondering what the four .pth files in the ESRGAN save folder are for, and which one i should load for using on eval.py or continuing training (or should i interpolate the models with net_interp.py from the ESRGAN github). (the names are dnet_ep0001.pth, optd.pth, optg.pth, and rrdb_ep0001.pth)
(the weights are automatically [0.01, 1, 5.0e-3]
in the .yml file.) If I change the .yml and start with PSNR-oriented training (weight 0) for a few hundred epochs, then can I stop training, change the weight in the .yml to 1e-3, and continue with GAN-oriented training?
As I expected, you should stick to old arch weights.
If you do not provide --pth
, then eval will automatically restore weights from save
folder. For multiple PTH, you only need to load rrdb_epxxxx.pth
to do inference, and don't set --pth
explicitly to resume training.
For the last question: sure, you can do that. The only thing you will notice is that you may need to seek a proper weight in trial and error.
Hello again!
When would be approximately a good epoch number to switch from PSNR training to GAN training by changing the weights in the yml? Is there a certain avg loss per epoch, number of the epoch, or other indication? Also, how is it possible to use trial and error when finding the proper weight when you have to train the model for a long time to see the effects of the weight? (or is this not the case)
Some info: Right now I am at 300 epochs (I started at 0 epochs with RRDB_PSNRx4_old_arch.pth). Also, the average loss has gone down from about 1.23 to about 1.18 over the last 250 epochs and the PSNR has fluctuated anywhere from 26 to 32 (usually about 29-30). I have been using 14000 of the celeba dataset full images (not cropped to faces) for training (after deleting all the images that were under 400 pixels in either width or height).
On a side note, I've noticed that the eyes and teeth sometimes appear pixelated or warped. I wonder what I could do to fix this (or would just training the GAN help)? Thank you for your help!
Here is an example using a test image I found on the internet: Low res:
Output:
HR Original:
Yeah, it's totally empirical...
ASAIK, you have to trail and error once a model...
Empirically, if GAN weight is non-zero, the evaluation PSNR would drop a delta P
, and I would monitor the number of P
. The higher of the weight, the more it drops.
The best range of P
may be [2-5] dB, which guides me on how to set a proper weight value.
Thanks! So about what epoch should I be switching to GAN?
I thought you could switch right now
Cool thanks
You mean calculate the approximate difference of the psnr value between the PSNR-oriented training and the newly weighted gan training right? Edit: I turned the weight up to 5.0e-3 and the pnsr is still at 29
evaluation PSNR has a relatively large fluctuation, you could observe training loss for example. And another thing matters is that keep loss of D-net away from being zero.
So I should aim for a higher D-net loss, or should I aim for the lowest D-net loss possible without it being zero (i.e. aiming for something like 0.00001)? (and D-net loss is loss_d right?)
Without being zero. It looks normal so far.
I am at epoch 400, and my loss_d keeps going to 00.00000. What should I do?
Hey Loseall! I am a relative beginner and got a bit lost while reading the readme files pertaining to adding datasets. Essentially, I want to use one of the models to change my photos from medium quality (some are blurry, others are a bit low resolution) to very high quality (they are pictures of upper-halves of people and generally around 400 pixels by 400 pixels). I had a few questions about getting started:
Once i download a set of high quality jpg images of random people, do i have to downscale them manually in order to train with them? (Or does the program automatically lower the resolution, run it through the network, and adjust weights automatically) (Also, if possible, do you know any good face datasets with high quality images for this?)
If I put the images in this directory:
C:\ProgramData\Anaconda3\envs\py37\VSR\VideoSuperResolution\Data\people_dataset
then what would i need to change in datasets.yaml for it to be recognized by the program?e.g: would i create a new entry under
Path:
such aspeople_dataset: C:\ProgramData\Anaconda3\envs\py37\VSR\VideoSuperResolution\Data\people_dataset
and a new entry
Datasets:
likeAlso, google colab seems to be a cool thing to check out because you can use a tesla t4 gpu for free and train with multiple accounts at once I made a very basic notebook for VideoSuperResolution here: https://colab.research.google.com/drive/1gTh78EfpZyQczUsUogxxmDPCmoGggoCG
I greatly appreciate the assistance!