csxmli2016 / textbsr

This is a simple text image blind super-resolution model, using BSRGAN
Other
87 stars 5 forks source link

Training/Finetuning questions #7

Closed kynthesis closed 5 months ago

kynthesis commented 6 months ago

Hi Xiaoming Li,

Thank you for your outstanding research. How can we train or finetune the bsrgan_text_256.pth file? And the datasets we should use to train/finetune the model further. Can you share more details about this project? I want to do research based on your excellent textbsr.

Best regards, Khoa D. Vo

csxmli2016 commented 6 months ago

Hi Xiaoming Li,

Thank you for your outstanding research. How can we train or finetune the bsrgan_text_256.pth file? And the datasets we should use to train/finetune the model further. Can you share more details about this project? I want to do research based on your excellent textbsr.

Best regards, Khoa D. Vo

Hi Khoa D. Vo, As for the network and training code, you can follow BSRGAN or Real-ESRGAN. The most different thing is the synthetic training data. You can refer to A simple demo for synthesizing the training Data of our work MARCONet.

kynthesis commented 5 months ago

Thank you very much for your quick response!

kynthesis commented 5 months ago

Hello Xiaoming Li,

Please allow me to ask you a few questions!

At the moment, I'm training with only 1 RTX 4090 with batch size 12, which may not be enough for this project.

Best regards, Khoa D. Vo

csxmli2016 commented 5 months ago

Hello Xiaoming Li,

Please allow me to ask you a few questions!

  • How many data samples have you used to train bsrgan_text_256? I trained with a dataset of 100.000 samples, but it doesn't seem enough. Now, I'm training with 1.000.000 samples.
  • Can you give more details about batch size, number of iterations, number of GPUs, or related settings that may affect the training progress? I'm using a training config of batch size 12, iteration 400.000.

At the moment, I'm training with only 1 RTX 4090 with batch size 12, which may not be enough for this project.

Best regards, Khoa D. Vo

The GPU server is enough for training this project. Since this work does not need the long sentence as input, you can synthesize input with a resolution of 128128 or 128256 for training (that means only one or two characters). This project is very easy to train. It takes nearly one or two days using one GPU to get a stable result.

kynthesis commented 5 months ago

BSRGAN_TEXT_256 (Your pre-trained model) z5231953897576_76a6dcc9b08264f0698a9144af1158b7_BSRGANText Real-ESRGAN (My training model) z5231953897576_76a6dcc9b08264f0698a9144af1158b7_BSRGANText

I tried many settings but still failed to train a stable model like yours. Can you please give me some directions?

csxmli2016 commented 5 months ago

BSRGAN_TEXT_256 (Your pre-trained model) z5231953897576_76a6dcc9b08264f0698a9144af1158b7_BSRGANText Real-ESRGAN (My training model) z5231953897576_76a6dcc9b08264f0698a9144af1158b7_BSRGANText

I tried many settings but still failed to train a stable model like yours. Can you please give me some directions?

How about the training process? You can use tensorboard to visualize the training images, including LR input, SR result, and ground-truth.