Open lyxhope opened 3 years ago
I would like to follow up on this. I'm also interested in this part.
Yes, we use ResNet18 as the image encoder to map images to latent codes. The training objective is simply MSE, and only the image encoder got optimized. Sorry that I couldn't find the full training code anymore, but the image encoder should have structure as below. As I recall, PQ-Net's SVR results are not as good as those works specifically targeting SVR (e.g., DISN).
import torch.nn as nn
from torchvision.models import resnet18
class ImageEncoder(nn.Module):
def __init__(self, z_dim=512):
super(ImageEncoder, self).__init__()
resnet = resnet18(pretrained=True)
modules = list(resnet.children())[:-1]
self.resnet = nn.Sequential(*modules)
self.fc = nn.Sequential(nn.Linear(512, z_dim))
def forward(self, x):
feature = self.resnet(x)
out = self.fc(feature.squeeze())
return out
Got it. Thanks for the code. So based on how I understood the paper, do I input the image feature into the decoder part of the Seq2SeqAE? DL beginner here.
Correct.
Hi thanks for your work. Can I ask a question in Chinese? 我想尝试一下关于SVR的工作,是不是像训练和测试gan那样,训练时把gan的训练换成一个ResNet的训练,测试时把gan生成latent code的部分换成用ResNet从图像来生成就可以?关于这部分您能提供相关的代码吗?