how to extract word and spatial feature

bo-miao commented 4 months ago

Hi,

Thanks for this interesting work! Could I know how to extract word-level textual features and pixel-level spatial features?

beichenzbc commented 4 months ago

We didn't leverage word-level textual features can be obtained by encode_text_full, which is used in SD models. We didn't use pixel-level spatial features. If you want them, you may comment the last three lines in ViT.forward function.

bo-miao commented 4 months ago

Thank you for your answer!

liuwanqingqing commented 3 months ago

Thank you for your answer!

Hello, may I ask if you have a method to directly load word features from a model with trained weights?

beichenzbc commented 3 months ago

Sorry I don't understand your question. Does 'the model with trained weights' means the pre-trained Long-CLIP model? Does the word feature means the embedding of each word in a sentence?

liuwanqingqing commented 2 months ago

您好，打扰了。我的问题的本意是longclip是否会像bert模型一样， outputs = bertmodel(**encoding.to(device)) sentence_features = outputs.last_hidden_state[:, 0, :].to(device) word_features = outputs.last_hidden_state[:, 1:-1, :].to(device) text_mask = encoding['attention_mask'].to(device) 可以通过从模型的输出中提取特定信息来获得所说的单词特征。如有打扰，十分抱歉。期待您的回复。祝好！

------------------ 原始邮件 ------------------ 发件人: "beichenzbc/Long-CLIP" @.>; 发送时间: 2024年8月9日(星期五) 中午11:06 @.>; @.**@.>; 主题: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature (Issue #57)

Sorry I don't understand your question. Does 'the model with trained weights' means the pre-trained Long-CLIP model? Does the word feature means the embedding of each word in a sentence?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

beichenzbc commented 1 month ago

那您可以使用model.encode_text_full函数，这样就会返回一个(token数，hidden dim)的特征，您在根据想要单词的token位置取出对应特征即可

liuwanqingqing commented 1 month ago

好的好的，感谢您的回复。祝好！

发自我的iPhone

------------------ Original ------------------ From: Beichen Zhang @.> Date: Thu,Oct 3,2024 9:54 PM To: beichenzbc/Long-CLIP @.> Cc: LLL @.>, Comment @.> Subject: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature(Issue #57)

那您可以使用model.encode_text_full函数，这样就会返回一个(token数，hidden dim)的特征，您在根据想要单词的token位置取出对应特征即可

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

liuwanqingqing commented 1 month ago

您好，再次打扰，想询问一下，可以使用什么函数，可以得到单词特征的mask。期待回复，祝好。 ------------------ 原始邮件 ------------------ 发件人: "beichenzbc/Long-CLIP" @.>; 发送时间: 2024年10月3日(星期四) 晚上9:54 @.>; @.**@.>; 主题: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature (Issue #57)

那您可以使用model.encode_text_full函数，这样就会返回一个(token数，hidden dim)的特征，您在根据想要单词的token位置取出对应特征即可

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

beichenzbc commented 1 month ago

您就把他乘以一个（0,0，..., 1, 0, ..., 0）就行，让他变成一个（1，hidden_dim）维度的向量

beichenzbc / Long-CLIP

how to extract word and spatial feature #57