questions about Figure 4 in FPE

They are the same image. To retain the spatial structure between the source and the target image, the upper part of this figure is generated based on replacing the self-attention map in layers 4-14.
In the upper part of Figure 4, "No replacement" indicates that there is no replacement of the cross-attention map from the source image; however, replacement of the self-attention map is included in order to preserve the spatial structure. In the lower part of Figure 4,"No replacement"="Direct Generation".

I've two questions:

Two subfigure (As marked in blue boxes) in Figure 4 seems exactly identical, why it happens

What's the differences in algorithm between "No replacement" and "Direct Generation"

Thanks

alibaba / EasyNLP

questions about Figure 4 in FPE #353