beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Apache License 2.0
694 stars 33 forks source link

how to extract word and spatial feature #57

Closed bo-miao closed 4 months ago

bo-miao commented 4 months ago

Hi,

Thanks for this interesting work! Could I know how to extract word-level textual features and pixel-level spatial features?

beichenzbc commented 4 months ago

We didn't leverage word-level textual features can be obtained by encode_text_full, which is used in SD models. We didn't use pixel-level spatial features. If you want them, you may comment the last three lines in ViT.forward function.

bo-miao commented 4 months ago

Thank you for your answer!

liuwanqingqing commented 3 months ago

Thank you for your answer!

Hello, may I ask if you have a method to directly load word features from a model with trained weights?

beichenzbc commented 3 months ago

Sorry I don't understand your question. Does 'the model with trained weights' means the pre-trained Long-CLIP model? Does the word feature means the embedding of each word in a sentence?

liuwanqingqing commented 2 months ago

您好,打扰了。我的问题的本意是longclip是否会像bert模型一样,         outputs = bertmodel(**encoding.to(device)) sentence_features = outputs.last_hidden_state[:, 0, :].to(device) word_features = outputs.last_hidden_state[:, 1:-1, :].to(device) text_mask = encoding['attention_mask'].to(device) 可以通过从模型的输出中提取特定信息来获得所说的单词特征。如有打扰,十分抱歉。期待您的回复。祝好!

------------------ 原始邮件 ------------------ 发件人: "beichenzbc/Long-CLIP" @.>; 发送时间: 2024年8月9日(星期五) 中午11:06 @.>; @.**@.>; 主题: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature (Issue #57)

Sorry I don't understand your question. Does 'the model with trained weights' means the pre-trained Long-CLIP model? Does the word feature means the embedding of each word in a sentence?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

beichenzbc commented 1 month ago

那您可以使用model.encode_text_full函数,这样就会返回一个(token数,hidden dim)的特征,您在根据想要单词的token位置取出对应特征即可

liuwanqingqing commented 1 month ago

好的好的,感谢您的回复。祝好!

发自我的iPhone

------------------ Original ------------------ From: Beichen Zhang @.> Date: Thu,Oct 3,2024 9:54 PM To: beichenzbc/Long-CLIP @.> Cc: LLL @.>, Comment @.> Subject: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature(Issue #57)

那您可以使用model.encode_text_full函数,这样就会返回一个(token数,hidden dim)的特征,您在根据想要单词的token位置取出对应特征即可

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

liuwanqingqing commented 1 month ago

您好,再次打扰,想询问一下,可以使用什么函数,可以得到单词特征的mask。期待回复,祝好。 ------------------ 原始邮件 ------------------ 发件人: "beichenzbc/Long-CLIP" @.>; 发送时间: 2024年10月3日(星期四) 晚上9:54 @.>; @.**@.>; 主题: Re: [beichenzbc/Long-CLIP] how to extract word and spatial feature (Issue #57)

那您可以使用model.encode_text_full函数,这样就会返回一个(token数,hidden dim)的特征,您在根据想要单词的token位置取出对应特征即可

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

beichenzbc commented 1 month ago

您就把他乘以一个(0,0,..., 1, 0, ..., 0)就行,让他变成一个(1,hidden_dim)维度的向量