cswry / SeeSR

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Apache License 2.0
331 stars 14 forks source link

argument about '--use_ram_encoder' #24

Closed striveAgain closed 3 months ago

striveAgain commented 3 months ago

Hello, thank you for sharing the code of SeeSR! When I read it, I found it did not seem to perform cross attention between the 'ram_encoder_hidden_states' and the resnet output during training. The screen shot is as follows, please give me some advice.

From your command CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7," accelerate launch train_seesr.py \ --pretrained_model_name_or_path="preset/models/stable-diffusion-2-base" \ --output_dir="./experience/seesr" \ --root_folders 'preset/datasets/training_datasets' \ --ram_ft_path 'preset/models/DAPE.pth' \ --enable_xformers_memory_efficient_attention \ --mixed_precision="fp16" \ --resolution=512 \ --learning_rate=5e-5 \ --train_batch_size=2 \ --gradient_accumulation_steps=2 \ --null_text_ratio=0.5 --dataloader_num_workers=0 \ --checkpointing_steps=10000, there's no ----use_ram_encoder.

Thus, image use_image_cross_attention will be False. image

While forwarding, image image it will skip to else, omitting the image_encoder_hidden_states.

Please give me some instructions, thanks!

cswry commented 3 months ago

Hello, the param use_image_cross_attention is set to True by default.

I have deleted the redundant param use_ram_encoder, please refer to the latest version.

Thanks for your remind!

striveAgain commented 3 months ago

OK, I have updated the latest code. Thank you!