NVlabs / SPADE

Semantic Image Synthesis with SPADE
https://nvlabs.github.io/SPADE/
Other
7.58k stars 987 forks source link

Image translation with SPADE? #112

Open kex243 opened 4 years ago

kex243 commented 4 years ago

Is there any way to not to use the label_img and just to make image to image translation with SPADE? Pix2pixHD has option -label_nc 0, but it is no avaliable here, cuases error. I mean paired aligned dataset. Thanks.

Williamlizl commented 3 years ago

HA, I also face the same question.If label_nc 0,it would be:

IndexError: index 0 is out of bounds for dimension 0 with size 0

kex243 commented 3 years ago

I was interested in this since I heard about SPADE. Now I had some time with code and can say something interesting. 1) It was possible run the code on the RGB to RGB images. It runs on windows, clean conda env, with setup of 3.7 python, 1.8 pytorch and all req according to txt file + it asked to download sync_batchnorm folder into networks folder before running even with 1 videocard. If someone needs more info about soft n hard I'll provide it. 2) Options to train is --batchSize 1 or 2 for two vidoecards, it works on windows --gpu_ids (0 or 1 or both), --contain_dontcare_label --label_nc 512 . Without two last parameters it won't run. Why 512 ? IDK Just trying to scale size of model into 8gb memory and it started to work, it doesnt work for me if it is 256 or 1024 or 0 or 1 and I didnt tried others. I aslo crop images to 256, it doesn fit memory req in other ways. It works with custom dataset that has folders outside of the workfolder, but cocos dataset setup with replacement of files in dataset folders works too. Also, it works with instance maps and without them, if --no_instance is chosen. As I remember, instances were hooked by small dataset and showed its influence on output image. 3) For a small dataset with 3 images only it showed capability to be overfited and to show correct response. Now Im trying to feed it with main dataset of 250k images. For now it has some color issues with first epoch, but it reminded me results of my first attempts to train on original pix2pix- it has the same color issues with firts epochs, hope it will be generalised. At least small dataset had no issues with color and forms after 200 epochs. Seems initialisation helps, which pix2pixHD lacks. It also has some issues with input image in results html folder, probably because of vectorising output image or marking original input files in the code, still looking how to fix it. But it doesn have any affect on result. Model G weight is 400 mb, not sure if it has same affect from input size as pix2pixHD has. Memory consumption for both cards are about 7 gb. Bottleneck is my HD drive, it would be better to use SSD for dataset at least for batch > 1.

DWCTOD commented 2 years ago

3. For a small dataset with 3 images only it showed capability to be overfited and to show correct response. Now Im trying to feed it with main dataset of 250k images. For now it has some color issues with first epoch, but it reminded me results of my first attempts to train on original pix2pix- it has the same color issues with firts epochs, hope it will be generalised. At least small dataset had no issues with color and forms after 200 epochs. Seems initialisation helps, which pix2pixHD lacks. It also has some issues with input image in results html folder, probably because of vectorising output image or marking original input files in the code, still looking how to fix it. But it doesn have any affect on result. Model G weight is 400 mb, not sure if it has same affect from input size as pix2pixHD has. Memory consumption for both cards are about 7 gb. Bottleneck is my HD drive, it would be better to use SSD for dataset at least for batch > 1.

Thanks for your shared the method ! It works! But the input label is abnormal. image

Macleodsolutions commented 2 years ago

So I took this approach trying to get https://github.com/tamarott/ASAPNet working with rgb to rgb translation. It works, but is painfully slow.

My training tests happen on a lowly 3060, which on a 512x512 image set took 1.35m/100iter using 10gb of vram, also known as "forever" in scability terms.

These are the following optimizations that worked for me to successfully train on limited dataset size:

Arguments

No changes from above recommendations: 1.35m/10gb label_nc =2 and no_one_hots: 0.43m/5gb (Also corrects input image previews) Batch = 4 : 0.37m/9.3gb (obvious one, but now that my vram wasnt maxed I could actually do this on a 3060) ndf = 32: 0.27m/8.6gb

If you have data throughput issues it can also be helpful to set nThreads > 0

File Changes

train.py - torch.backends.cudnn.benchmark = True : 0.23m/8.8gb

pix2pixmodel.py (required apex) optimizer_G = apex.optimizers.FusedAdam(G_params, lr=G_lr, betas=(beta1, beta2)) optimizer_D = apex.optimizers.FusedAdam(D_params, lr=D_lr, betas=(beta1, beta2)) 0.225/8.8gb

Mixed Precision

Simply following the standard Apex changes (So helpfully provided here https://github.com/bholeshwar/SPADE-with-AMP) results in 0.178m/8.9gb (or 0.214m/9gb at ndf=64)

However in testing I found I had to lower my learning rate (In my case down to 0.0001) to avoid gradient overflow crashes.

Update: Division by zero crashes continued to occur even with a reduced learning rate on my full training set. My solution was to set amp to O2. Slight loss in training speed, but still much improved and now totally stable for my training task at default learning rate.

Conclusion

So from 1.35m/100iters down to, in my case, 0.214m/100iters, making it actually feasible to test on my 3060 before deploying the full dataset to a cloud gpu.

Full disclosure, I'm a Technical Artist, not a Data Scientist, so it's very possible Gob voice I've made a huge mistake, but based on the limited dataset I provided (1000 pairs), after 300 epochs the training quality was as expected.

Hopes this saves someone some time/money!

AugustLee93 commented 2 years ago

I was interested in this since I heard about SPADE. Now I had some time with code and can say something interesting.

  1. It was possible run the code on the RGB to RGB images. It runs on windows, clean conda env, with setup of 3.7 python, 1.8 pytorch and all req according to txt file + it asked to download sync_batchnorm folder into networks folder before running even with 1 videocard. If someone needs more info about soft n hard I'll provide it.
  2. Options to train is --batchSize 1 or 2 for two vidoecards, it works on windows --gpu_ids (0 or 1 or both), --contain_dontcare_label --label_nc 512 . Without two last parameters it won't run. Why 512 ? IDK Just trying to scale size of model into 8gb memory and it started to work, it doesnt work for me if it is 256 or 1024 or 0 or 1 and I didnt tried others. I aslo crop images to 256, it doesn fit memory req in other ways. It works with custom dataset that has folders outside of the workfolder, but cocos dataset setup with replacement of files in dataset folders works too. Also, it works with instance maps and without them, if --no_instance is chosen. As I remember, instances were hooked by small dataset and showed its influence on output image.
  3. For a small dataset with 3 images only it showed capability to be overfited and to show correct response. Now Im trying to feed it with main dataset of 250k images. For now it has some color issues with first epoch, but it reminded me results of my first attempts to train on original pix2pix- it has the same color issues with firts epochs, hope it will be generalised. At least small dataset had no issues with color and forms after 200 epochs. Seems initialisation helps, which pix2pixHD lacks. It also has some issues with input image in results html folder, probably because of vectorising output image or marking original input files in the code, still looking how to fix it. But it doesn have any affect on result. Model G weight is 400 mb, not sure if it has same affect from input size as pix2pixHD has. Memory consumption for both cards are about 7 gb. Bottleneck is my HD drive, it would be better to use SSD for dataset at least for batch > 1.

Hi @kex243, I was trying to translate RGB2RGB without instance map and seg_map. Can u gimme more information that how to train it? Like the parameter setting, such as --label_nc , which value should I set for label_nc? For my case, the data_model I use is custom right? Any helps will be appreciated! Thanks

cyprian commented 2 years ago

So I took this approach trying to get https://github.com/tamarott/ASAPNet working with rgb to rgb translation. It works, but is painfully slow.

My training tests happen on a lowly 3060, which on a 512x512 image set took 1.35m/100iter using 10gb of vram, also known as "forever" in scability terms.

These are the following optimizations that worked for me to successfully train on limited dataset size:

Arguments

No changes from above recommendations: 1.35m/10gb label_nc =2 and no_one_hots: 0.43m/5gb (Also corrects input image previews) Batch = 4 : 0.37m/9.3gb (obvious one, but now that my vram wasnt maxed I could actually do this on a 3060) ndf = 32: 0.27m/8.6gb

If you have data throughput issues it can also be helpful to set nThreads > 0

File Changes

train.py - torch.backends.cudnn.benchmark = True : 0.23m/8.8gb

pix2pixmodel.py (required apex) optimizer_G = apex.optimizers.FusedAdam(G_params, lr=G_lr, betas=(beta1, beta2)) optimizer_D = apex.optimizers.FusedAdam(D_params, lr=D_lr, betas=(beta1, beta2)) 0.225/8.8gb

Mixed Precision

Simply following the standard Apex changes (So helpfully provided here https://github.com/bholeshwar/SPADE-with-AMP) results in 0.178m/8.9gb (or 0.214m/9gb at ndf=64)

However in testing I found I had to lower my learning rate (In my case down to 0.0001) to avoid gradient overflow crashes.

Update: Division by zero crashes continued to occur even with a reduced learning rate on my full training set. My solution was to set amp to O2. Slight loss in training speed, but still much improved and now totally stable for my training task at default learning rate.

Conclusion

So from 1.35m/100iters down to, in my case, 0.214m/100iters, making it actually feasible to test on my 3060 before deploying the full dataset to a cloud gpu.

Full disclosure, I'm a Technical Artist, not a Data Scientist, so it's very possible Gob voice I've made a huge mistake, but based on the limited dataset I provided (1000 pairs), after 300 epochs the training quality was as expected.

Hopes this saves someone some time/money!

@Macleodsolutions Did you had any issues with artifacts (random color pixels) in your training of ASAPNet?

iamrishab commented 2 years ago

@AugustLee93 For making SPADE work for image to image translation like pix2pix we need to make changes in the dataloader and some changes in the network architecture parameters as well. I have already made those modifications and will raise a PR. The workaround suggested by @kex243 will not work.

AugustLee93 commented 2 years ago

@iamrishab When will you share your code? Appreciate it !

iamrishab commented 2 years ago

@AugustLee93 PR raised.