Can gansformer be used as an image-to-image translation model?

Hi! I notice that in the paper, gansformer is compared with many image-to-image translation models such as SPADE. I find that in the pytorch-version code, the model can be feed by a condition information, I guess this is something like CGAN because it asserts len(self.label_shape) == 1. But can this condition be like a semantic mask? If so this model can be used as a image-to-image translation? Thanks for your help :)

dorarad / gansformer

Can gansformer be used as an image-to-image translation model? #39