Open buproof opened 1 year ago
I am sorry that we have not organized our visualization code. But I can provide you with a core demo code:
import cv2
import matplotlib.pyplot as plt
import skimage.transform
import numpy as np
def visualize_attention(image_path, alpha_weights):
# load image
img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# get Height, Width of image
H, W = img.shape[:2]
dH, dW = H//4, W//4
alpha_weights = alpha_weights
# keep the top-k weights
k = 20
_tmp = alpha_weights.reshape(-1)
top_k = _tmp[_tmp.argsort()[-k]]
alpha_weights = alpha_weights * (alpha_weights >= top_k)
# resize the weights from (12, 12) to (H/4, W/4)
alpha_weights = skimage.transform.resize(alpha_weights, (dH, dW))
# expand the weights to the raw size of image
alpha_weights = skimage.transform.pyramid_expand(alpha_weights, upscale=4, sigma=20)
# draw image and weights
plt.plot()
plt.imshow(img)
plt.imshow(alpha_weights, alpha=0.75, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
alpha_weights = np.array(
[5.0248e-03, 5.2091e-03, 5.0840e-03, 5.1059e-03, 3.2360e-02, 1.0273e-03,
2.9019e-04, 3.2377e-02, 5.1033e-03, 5.3537e-04, 5.2795e-03, 5.2804e-03,
5.0838e-03, 5.0338e-03, 5.4204e-03, 5.0996e-03, 3.2342e-02, 2.1245e-04,
1.1865e-03, 3.2370e-02, 5.1377e-03, 2.7714e-04, 3.2365e-02, 5.4530e-03,
2.1483e-03, 1.8291e-03, 1.3979e-04, 9.0597e-04, 5.1887e-03, 3.3162e-03,
5.9515e-03, 5.2063e-03, 3.7140e-03, 5.7669e-03, 5.2450e-03, 5.0991e-03,
2.2931e-03, 1.0192e-03, 1.2310e-03, 1.7673e-03, 3.2369e-02, 1.4196e-02,
2.5353e-02, 3.2365e-02, 4.2024e-04, 5.2958e-04, 1.6338e-03, 2.3828e-03,
5.4352e-03, 5.1889e-03, 2.1982e-03, 3.3123e-04, 3.2343e-02, 2.4629e-03,
2.2377e-03, 1.5513e-04, 3.1852e-04, 2.2781e-04, 1.6502e-03, 9.5750e-04,
4.8194e-04, 7.9026e-03, 9.6730e-04, 1.5098e-02, 1.7108e-03, 8.0923e-04,
1.1966e-03, 8.3894e-04, 3.7549e-03, 5.2052e-03, 1.4130e-03, 1.9779e-03,
1.5995e-03, 2.7751e-03, 5.5997e-03, 7.0124e-03, 2.1481e-03, 5.7834e-03,
1.2972e-03, 1.7500e-04, 7.7323e-03, 1.8277e-03, 1.7876e-03, 1.3740e-03,
5.2334e-03, 3.2342e-02, 8.6587e-03, 1.5491e-03, 3.2362e-02, 4.0188e-03,
4.4041e-04, 6.4261e-04, 1.4355e-03, 8.3124e-03, 5.3338e-03, 4.9598e-03,
5.4005e-03, 4.6577e-03, 2.1362e-02, 3.9373e-03, 3.2342e-02, 8.5086e-04,
6.0412e-04, 1.3558e-04, 6.1554e-03, 5.5917e-03, 5.2004e-03, 1.7581e-03,
5.1032e-03, 5.4421e-03, 3.2950e-03, 2.8823e-03, 5.1852e-03, 1.9310e-03,
8.0221e-04, 3.6786e-04, 3.7763e-03, 6.1263e-04, 5.2769e-03, 5.0333e-03,
5.1957e-03, 5.1147e-03, 1.5568e-03, 9.7842e-05, 3.2341e-02, 6.3536e-04,
7.0632e-04, 5.4808e-04, 2.7613e-03, 1.5866e-03, 5.3861e-03, 3.2329e-02,
5.5115e-03, 3.2350e-02, 1.3547e-03, 5.1975e-03, 3.2346e-02, 4.3569e-04,
2.0293e-03, 3.2360e-02, 6.9499e-04, 2.4257e-03, 3.2363e-02, 4.9784e-03]
).reshape([12, 12])
visualize_attention('./COCO_val2014_000000483108.jpg', alpha_weights[:, :])
The result should be like this:
What you need to do is get the attention weight alpha_weights
you want to visualize
Thank you for your prompt reply! Now I have other questions and i hope you could help me !!!
Answer to 1 and 2: Yes, but not perfect. --load_epoch
and --resume
are all used to re-load weights. Specifically:
--resume
is the epoch number of trained model that you want to re-load (for example, if you set it to 3, it will load the model of caption_model_3.pth
). Note: --resume
just load the weights of a model, but will not load the optimizer and scheduler (because I didn't store their state_dict).--load_epoch
is used to re-load the learning rate. For example, if the training was interrupted at 3rd epoch, and you want to continue training. The correct way is to re-load all detail of the interrupted checkpoint, but we just store the model weights, so --load_epoch
is just a simple way to restore the learning_rate state of the interrupted checkpoint. It can not re-load the all details.Answer to 3: Actually I save all attention weights of corresponding generated words and then generated the visualization image for each word, and finally I use Visio to manually compose the visualization images and words to fit the paper width. I have not scripted the full process into one stage.
Hello, I would like to know which step of inference should be taken for attention weight, and which stage of attention weight should be taken when generating each word?Thanks
@Lujinfu1999 hello, did u find the answer to this?
Hello, I would like to know which step of inference should be taken for attention weight, and which stage of attention weight should be taken when generating each word?Thanks
I tried get attention weights from the last decoder's cross-attention's last head,maybe you can try it. @not-hermione
I tried get attention weights from the last decoder's cross-attention's last head,maybe you can try it. @not-hermione
Was the attention maps visualization convincing?
Hi, author ! Thanks for this great code!
I want to reproduce the visualization results, but I cannot find the corresponding code in this repo.
I read the paper but I think it's not easy for me to reproduce the visualization results correctly.
May I have the code to produce the visualization results, or is there anything I missed?
Thank you very much!