callsys / GenPromp

[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
Apache License 2.0
55 stars 2 forks source link

Asking for questions about evaluation #6

Open mengmeng18 opened 1 year ago

mengmeng18 commented 1 year ago

Thanks for your great work! There is an issue during testing. When using python main.py --function test --config configs/cub_stage2.yml --opt "{'test': {'load_token_path': 'ckpts/cub983/tokens/', 'load_unet_path': 'ckpts/cub983/unet/', 'save_log_path': 'ckpts/cub983/log.txt'}}" for evaluation, I found that self.step_store、self. attention_store and self.attention_maps are all empty. Would you please tell me where is wrong? Looking forward to your reply!

callsys commented 1 year ago

The most likely reason is that the register_attention_control function at line 67 of attn.py is not working properly. In line 115 of attn.py, we replace the get_attention_scores method for all CrossAttention modules in the unet. A different version of diffuser may result in the CrossAttention module no longer containing the get_attention_scores method. The problem you mentioned is that the diffuser version is likely incorrect.

mengmeng18 commented 1 year ago

Thanks a lot! Would you please tell me how to fix this error?

callsys commented 1 year ago

1、try pip install --upgrade diffusers[torch]==0.13.1, which is the version we use. 2、Check whether the code runs through the get_attention_scores method at line 71 of attn.py. This method adds attention maps to self.step_store, self.attention_store and self.attention_maps.

mengmeng18 commented 1 year ago
  1. I have checked that the version is diffusers[torch]==0.13.1.
  2. The code runs AttentionStore.register_attention_control(controller, unet) at the Line 227 of main.py, and then it does runs through the get_attention_scores method at line 71 of attn.py. However, after running these lines, I find that self.step_store, self.attention_store and self.attention_maps are still empty. Could you give me some other advices to help me fix this error?
callsys commented 1 year ago

At line 106 in attn.py, attention_probs = controller(attention_probs, is_cross, place_in_unet) add the attention_probs into the self.step_store in the controller, you can check if the code goes through this line.

mengmeng18 commented 1 year ago

Thanks a lot! I will check it again.

KevinLi-167 commented 7 months ago

I'm having a similar issue.

I use the CUB dataset and modify the smaller batch_size for the 2-stage training.

train_token I use the default float32. Since I only have 8g of video memory, I changed train_unet to float16, batch_size=1. By default, float16 is used for inference.

After 250step training, an error is reported in my inference. The reason for this is that the attention map like cam part has a value of nan. Positioned to be the CLIP output, the last 4 of the 6 fr here are all nan.

bug of clip nan2 bug of clip nan3

callsys commented 7 months ago

Since CLIP(text encoder) is frozen all the time, it seems that there is a problem with the representative embeddings trained in stage 1. Does the model you trained in stage 1 nan?

Besides, the model requires a large batch size for stage 2 training. If your machine does not have enough memory, using large gradient accumulation can be fine.

KevinLi-167 commented 7 months ago

Thank you for your reply! I did realize the problem with fr. (And just found out that the loss is always nan in the log of training unet, so my stage2 may also be completely invalid) I'm already trying to retrain. (At stage1, i can't use float16 because loss will show nan .So i still use float32)

I'd like to confirm that fr relies only on the first stage of train_token ,right? (fr is used for subsequent "training unet" and "inference" as a frozen content)

I have one more question, z0 in the paper is encoded by a VAGAN. But VAE is used in the code. What is the possible reason for the change of image encoder from VAGAN to VAE, or why it is not the same as the paper?

Thanks again for your reply, I'll read the source code carefully again and try to train.

callsys commented 7 months ago

1、fr relies only on the train_token.

2、VQGAN is an improved version of VAE, and they are similar in structure.