google-research / composed_image_retrieval

Apache License 2.0
175 stars 18 forks source link

About training detail of text encoder #24

Open yzrs opened 9 months ago

yzrs commented 9 months ago

Hi, thanks for your contribution. I have one question about the training details.

In the Figure2.left of the paper, both of visual encoder and text encoder are frozen during training,meaning that no gradients will be generated.

But in the get_loss_img2text function of src/trainer.py, it seems that some gradients will be generated in get_text_features function.

Is there something I'm misunderstanding? I would be grateful if you could answer my question.

def get_text_features(model, token_features, args):
    text = tokenize("a photo of")
    text = text.cuda(args.gpu, non_blocking=True)
    text = text.view(1, -1)
    text = text.repeat(token_features.size(0), 1)
    text_features = model.encode_text_img(text, token_features)
    return text_features

def get_loss_img2text(model, img2text, images, loss_img, loss_txt, args, memory=None):
    with torch.no_grad():
        image_features = model.encode_image(images)
    token_features = img2text(image_features)
    text_features = get_text_features(model, token_features, args)
caoziyang1997 commented 8 months ago

can you give me file about Train_GCC-training_output.csv and Validation_GCC-1.1.0-Validation_output.csv! Thank you!

caoziyang1997 commented 8 months ago

can you give me file about Train_GCC-training_output.csv and Validation_GCC-1.1.0-Validation_output.csv! Thank you!