Some questions about the semantic feature decoding

xyhanHIT commented 5 months ago

Hello, when I run the code "Semantic_feature_decoding.py", I find the following issues:

In Line 44, ”X.shape[1]“ reports a dimension error. In fact, X seems to have only one dimension.

I try to change "X.shape[1]" to "X.shape[0]", then the regression result displayed in Line 84 is shown below, I want to konw if 0.53 is a normal result?

In Line 150, "decode_LDM_text_feature" needs a parameter "args.cls_token_path", I don't konw how to get this cls file, and the "Namespace" doesn't have this attribute.

ReedOnePeck commented 5 months ago

The value seems to be a bit high. The mean I reported in the paper is around 0.26. You need to check to ensure that X.shape[0] is indeed the number of data points in your dataset.
After text feature extraction, the dimensions of a sentence are (1, 20,768), where the first dimension among the 20 dimensions is the cls_token. This token is exactly the same for all sentences and does not need to be decoded. In reality, in the Semantic_feature_decoding.py file, we only decode the last 19 dimensions, and then merge the cls_token with the decoded tokens during the image reconstruction phase. You can extract and save the cls_token yourself after the text feature extraction.

xyhanHIT commented 5 months ago

Thanks for your reply! I have another problem when I run the file "Reconstruction.py". In Line 383, the parameter "loss_CLIP_weight" seems to have not been mentioned before, what is the value you set during the iteration? 微信截图_20240329150302

ReedOnePeck commented 5 months ago

Sorry for forgetting to change it, this weight refers to the weight represented by the value of the last layer of the CLIP, and should be set to 0

ReedOnePeck / MindDiffuser

Some questions about the semantic feature decoding #9