LoSealL / VideoSuperResolution

A collection of state-of-the-art video or single-image super-resolution architectures, reimplemented in tensorflow.
MIT License
1.62k stars 296 forks source link

Custom Data Set #75

Closed davrocks closed 5 years ago

davrocks commented 5 years ago

Hey Loseall! I am a relative beginner and got a bit lost while reading the readme files pertaining to adding datasets. Essentially, I want to use one of the models to change my photos from medium quality (some are blurry, others are a bit low resolution) to very high quality (they are pictures of upper-halves of people and generally around 400 pixels by 400 pixels). I had a few questions about getting started:

  1. Once i download a set of high quality jpg images of random people, do i have to downscale them manually in order to train with them? (Or does the program automatically lower the resolution, run it through the network, and adjust weights automatically) (Also, if possible, do you know any good face datasets with high quality images for this?)

  2. If I put the images in this directory: C:\ProgramData\Anaconda3\envs\py37\VSR\VideoSuperResolution\Data\people_dataset then what would i need to change in datasets.yaml for it to be recognized by the program?

e.g: would i create a new entry under Path: such as

people_dataset: C:\ProgramData\Anaconda3\envs\py37\VSR\VideoSuperResolution\Data\people_dataset

and a new entry Datasets: like

people_dataset: 
    train: people_dataset
    val: people_dataset  
    test: people_dataset
  1. is esrgan the best model for improving face quality or is there a better one? (e.g. should i be using a CNN,GAN,AE, or VAE type of model)
  2. how many epochs should I run to train a dataset of about 5000-10000 photos?
  3. How are the training files saved, and how do I resume training from one of these files?
  4. do you have any other suggestions?

Also, google colab seems to be a cool thing to check out because you can use a tesla t4 gpu for free and train with multiple accounts at once I made a very basic notebook for VideoSuperResolution here: https://colab.research.google.com/drive/1gTh78EfpZyQczUsUogxxmDPCmoGggoCG

I greatly appreciate the assistance!

LoSealL commented 5 years ago

I thought it's quite complicated to explain everything in one thread for a newbie. So let's kick them off one-by-one.

1.

Once I download a set of high-quality jpg images of random people, do I have to downscale them manually in order to train with them?

The answer is YES|NO. In VSR we have a data loader. If you choose the default data parser (a parser is how to pre-process input data) then VSR will generate LR image inside the data loader (bicubically). However, if you choose another parser (e.g custom_pairs), you need to prepare a self-made LR image.

To choose either method depends on your actual task:

LoSealL commented 5 years ago

2.

If I put the images in this directory: then what would I need to change in datasets.yaml for it to be recognized by the program?

Just edit datasets.yaml. I think it's easy to understand. Note there is a root entry at the top of the file, and every path is relative to this root.

LoSealL commented 5 years ago

3.

is esrgan the best model for improving face quality or is there a better one? (e.g. should I be using a CNN, GAN, AE, or VAE type of model)

If you JUST want to improve face quality, you'd better retrain a model. The provided pre-trained weights of ESRGAN is for a general purpose. To choose a proper model, just remember the more parameters it has, the more accurate it generates, and the harder to train.

LoSealL commented 5 years ago

4.

how many epochs should I run to train a dataset of about 5000-10000 photos?

Basically, you need 10k+ iterations. The default steps per epoch in VSR setting is 200, so I suggest to train more than 500 epochs for basic usage. In fact, I usually train 1000-2000 epochs for fine-tuning.

LoSealL commented 5 years ago

5.

How are the training files saved, and how do I resume training from one of these files?

VSR automatically save checkpoints in --save_dir, by default it's in ../Results/<model-name>/save. And VSR also resumes training from the --save_dir.

For VSRTorch/train.py, you can even assign --pth to a specific weights file to resume.

LoSealL commented 5 years ago

6.

do you have any other suggestions?

Read documents and wiki :)

davrocks commented 5 years ago

Thanks for the extended reply! First, I downloaded the first 12,000 images of CelebA and split them up into folders for train, test, and val, each with "hr" and "lr" folders inside (lr is 4x downscaled from hr). Then, I created a dataset named PEOPLE and tried to use the custom parser. However, there was an error when i ran python train.py esrgan --cuda --dataset PEOPLE --epochs 500 --pth /usr/local/VideoSuperResolution/Results/RRDB_GAN.pth (see attached) error.txt

I'm not sure if the missing weights error is normal, but i seem to get that error whenever I load the RRDB_GAN.pth on esrgan; however, there is a new error at the bottom that I just started getting when using the PEOPLE dataset:

Traceback (most recent call last):
  File "train.py", line 124, in <module>
    main()
  File "train.py", line 118, in main
    t.fit([tloader, vloader], train_config)
  File "/usr/local/VideoSuperResolution/VSRTorch/Framework/Trainer.py", line 114, in fit
    train_iter = v.train_loader.make_one_shot_iterator(mem, shuffle=True)
  File "/usr/local/VideoSuperResolution/VSR/DataLoader/Loader.py", line 378, in make_one_shot_iterator
    shuffle=shuffle)
  File "/usr/local/VideoSuperResolution/VSR/DataLoader/Loader.py", line 201, in _generate_crop_grid
    y = np.random.randint(0, _h - _ph + 1, size=amount)
  File "mtrand.pyx", line 992, in mtrand.RandomState.randint
ValueError: Range cannot be empty (low >= high) unless no samples are taken

Here is a copy of my datasets.yaml:

datasets.txt

Also, starting my training from RRDB_GAN.pth a good idea? or should I completely start over the training?

Thanks for helping

LoSealL commented 5 years ago

I think this is caused by some images in CelebA is smaller than 128x128. In esrgan.yml the patch_size is 128, it will randomly crop 128x128 patches from HR images and 32x32 from LR.

Import error is OK because provided RRDB_GAN.pth doesn't contain weights for the discriminator. Hence fine-tuning from RRDB_GAN.pth is not a good idea. However, you can start from RRDB_PSNR.pth, and you can choose whether to train a discriminator by setting its weights to zero or non-zero.

For example:

# PSNR-oriented training, no DNET
python train.py esrgan --cuda --weights=[0.01,1,0] --epochs=100 --dataset=people
# SRGAN-oriented training, with GAN weights 0.001
python train.py esrgan --cuda --weights=[0.01,1,0.001] --epochs=100 --dataset=people

This weights is overwriten to esrgan.yml.

You could find weights at: https://drive.google.com/drive/u/0/folders/17VYV_SoZZesU6mbxz2dMAIccSSlqLecY or https://pan.baidu.com/s/1-Lh6ma-wXzfH8NqeBtPaFQ?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0&traceid=#list/path=%2F

davrocks commented 5 years ago

Thanks; so I should start with RRDB_PSNR.pth and go to the gan training with 0.001 for best results? (Or should I begin training with --weights=[0.01,1,0] for PSNR, go for a couple hundred epochs, then run training again with --weights=[0.01,1,0.001] ) And if i want to resume training from the saved file, then I dont have to put --weights=[0.01,1,0.001] in again right?

LoSealL commented 5 years ago

In order to train continuously, the best effort is to edit YML file directly.

For who are new to deep learning, I suggest not to touch GAN. Its hyper-parameters highly depend on dataset, which requires master experience to tune it.

You are fine to start with ESRGAN and setting the last weight to zero.

davrocks commented 5 years ago

"to start with ESRGAN" you mean starting from RRDB_PSNR.pth right?

LoSealL commented 5 years ago

Yes

LoSealL commented 5 years ago

I notice that the author provided RRDB_PSNRx4 and RRDB_PSNRx4_old_arch, maybe you should try the old one.

davrocks commented 5 years ago

Would PSNR training provide a good result though? or should I switch to another model that is easier to train

LoSealL commented 5 years ago

Actually that depends on what you want. You could first use RRDB_PSNRx4 to generate SR faces and check the quality.

In my view, PSNR-oriented model is fine enough to beat traditional methods. You could set GAN weights to a reasonably small value (i.e. 1e-4) to see what gonna to happen compared to RRDB_PSNRx4.

davrocks commented 5 years ago

Thanks, I will try the psnr first :)

davrocks commented 5 years ago

Hello; I ended up going for the old_arch because it could load the rrdb parameters (and it started out with a lower image loss). Here are the logs: Old Arch.txt New Arch.txt

Now, I was just wondering what the four .pth files in the ESRGAN save folder are for, and which one i should load for using on eval.py or continuing training (or should i interpolate the models with net_interp.py from the ESRGAN github). (the names are dnet_ep0001.pth, optd.pth, optg.pth, and rrdb_ep0001.pth)

(the weights are automatically [0.01, 1, 5.0e-3] in the .yml file.) If I change the .yml and start with PSNR-oriented training (weight 0) for a few hundred epochs, then can I stop training, change the weight in the .yml to 1e-3, and continue with GAN-oriented training?

LoSealL commented 5 years ago

As I expected, you should stick to old arch weights.

If you do not provide --pth , then eval will automatically restore weights from save folder. For multiple PTH, you only need to load rrdb_epxxxx.pth to do inference, and don't set --pth explicitly to resume training.

For the last question: sure, you can do that. The only thing you will notice is that you may need to seek a proper weight in trial and error.

davrocks commented 5 years ago

Hello again!

When would be approximately a good epoch number to switch from PSNR training to GAN training by changing the weights in the yml? Is there a certain avg loss per epoch, number of the epoch, or other indication? Also, how is it possible to use trial and error when finding the proper weight when you have to train the model for a long time to see the effects of the weight? (or is this not the case)

Some info: Right now I am at 300 epochs (I started at 0 epochs with RRDB_PSNRx4_old_arch.pth). Also, the average loss has gone down from about 1.23 to about 1.18 over the last 250 epochs and the PSNR has fluctuated anywhere from 26 to 32 (usually about 29-30). I have been using 14000 of the celeba dataset full images (not cropped to faces) for training (after deleting all the images that were under 400 pixels in either width or height).

On a side note, I've noticed that the eyes and teeth sometimes appear pixelated or warped. I wonder what I could do to fix this (or would just training the GAN help)? Thank you for your help!

Here is an example using a test image I found on the internet: Low res: lr

Output: output

HR Original: original-hr

LoSealL commented 5 years ago

Yeah, it's totally empirical... ASAIK, you have to trail and error once a model... Empirically, if GAN weight is non-zero, the evaluation PSNR would drop a delta P, and I would monitor the number of P. The higher of the weight, the more it drops. The best range of P may be [2-5] dB, which guides me on how to set a proper weight value.

davrocks commented 5 years ago

Thanks! So about what epoch should I be switching to GAN?

LoSealL commented 5 years ago

I thought you could switch right now

davrocks commented 5 years ago

Cool thanks

davrocks commented 5 years ago

You mean calculate the approximate difference of the psnr value between the PSNR-oriented training and the newly weighted gan training right? Edit: I turned the weight up to 5.0e-3 and the pnsr is still at 29

LoSealL commented 5 years ago

evaluation PSNR has a relatively large fluctuation, you could observe training loss for example. And another thing matters is that keep loss of D-net away from being zero.

davrocks commented 5 years ago

So I should aim for a higher D-net loss, or should I aim for the lowest D-net loss possible without it being zero (i.e. aiming for something like 0.00001)? (and D-net loss is loss_d right?)

LoSealL commented 5 years ago

Without being zero. It looks normal so far.

davrocks commented 5 years ago

I am at epoch 400, and my loss_d keeps going to 00.00000. What should I do?

LoSealL commented 5 years ago
  1. Decreasing the learning rate
  2. Review your dataset, is this dataset too easy for the discriminator and too hard for the generator?
  3. Replay with default ESRGAN, and DIV2K dataset, see if you can train a vanilla model from scratch.