KamitaniLab / GenericObjectDecoding

Demo code for Horikawa and Kamitani (2017) Generic decoding of seen and imagined objects using hierarchical visual features. Nat Commun https://www.nature.com/articles/ncomms15037.
149 stars 47 forks source link

'improper assignment with rectangle empty matrix' #1

Closed zzstefan closed 4 years ago

zzstefan commented 7 years ago

Using the matlab code, after I've setup all the environment and essential data files, and run the analysis_FeaturePrediction.m file. There is always a mistake. 'Improper assignment with rectangle empty matrix' happens in line 223 in the analysis_FeaturePrediction.m.

ShuntaroAoki commented 7 years ago

I couldn't reproduce the error in my environment. Could you give me more details about your environment (OS, MATLAB version) and full outputs of the script (analysis_FeaturePrediction.m)?

zzstefan commented 7 years ago

My environment is MacOS 10.12.6, MATALB version is R2014b, the full outputs is below:

analysis_FeaturePrediction analysis_FeaturePrediction started Loading brain data... Loading image feature data... Start analysis_FeaturePrediction-Subject1-V1-cnn1 Unit 1 Improper assignment with rectangular empty matrix.

Error in get_refindex (line 33) refIndex(i) = hitIndex;

Error in get_refdata (line 18) refIndex = get_refindex(foreignKey, refKey);

Error in analysis_FeaturePrediction (line 223) trainFeat = get_refdata(trainFeat, trainImageIds, trainLabels);

ShuntaroAoki commented 7 years ago

I tried the code in environment similar to yours but still couldn't get the error. It seems something is wrong with label values or image IDs in the data rather than a bug in code. I suspect that data you downloaded from BrainLiner are somehow corrupted and have wrong values. Could you check SHA1 hash values of 'ImageFeature.mat' and 'Subject1.mat' you have (in Mac, type shasum -a 1 ImageFeatures.mat in Terminal to get the values). If the files are correct, the hash values should be:

If you find the files are correct, then could you give me the exact values of trainFeat, trainImageIds, and trainLabels (type save('errordata.mat', 'trainFeat', 'trainImageIds', 'trainLabels') in MATLAB when you encounter the error, and attach the mat file to this board)?

zzstefan commented 7 years ago

You are so kind and thank you so much. The ImageFeatures.mat is somehow corrupted and I downloaded again. I have another question that if I want to calculate the features by myself, is the CNN related data the imagenet-vgg-f.mat file?

ShuntaroAoki commented 7 years ago

Nice to hear your problem was sovled. Thank you for the report. I'll add a script to validate downloaded data files.

If you want to use your own features, you need to calculate features (i.e., responses of units) for all images (and image categories) and save them in a mat file with specific format (BrainDecoderToolbox2 format), not just replacing 'ImageFeatures.mat' with 'imagenet-vgg-f.mat'. I'll soon add a document about the data structure and an example script to make the feature data.

zzstefan commented 7 years ago

Hi, I have run this program. But I found it ran so slowly and took about 4-5 days for one subject on a server. So is there any ways to fasten this process.

ShuntaroAoki commented 7 years ago

Actually, the analysis code inevitably takes a LONG time to be completed since it employs sparse regression, which requires a huge computation cost, and runs a lot of regression (1000 units in each CNN layers per each ROI). If you just want to see general tendency of the results, you can use simple linear regression instead of the sparse regression or you can reduce the number of units included in the analysis.

You may be able to utilize parallel (distributed) processing for speed-up, as we do in our lab (actually, the scripts are designed to be ran in parallel on several servers). But I cannot give you a quick solution on how to run the parallel processing since it highly depends on your computer environment.

zzstefan commented 7 years ago

Due to the limitation of our lab's hardware, I think I'll switch to parfor in MATLAB on one server. I will make some changes to the scripts.

zzstefan commented 7 years ago

Sorry to bother you again, I have run the code and got the ultimate results. But there are some codes I really cannot understand in pwidentification.m.

% Calculate correct rate cr = (numCandidate - numIncorrect(ind)) ./ numCandidate;

This is how I understand the whole process. We have 50 predicted test perception/imagery feature vectors versus the combination of 50 test category average feature vectors and 15322 other candidate feature vectors. So by calculating the correlation coefficient between these two we can identify the seen and imagined object which was the one with the highest scores from the 15372 categories. So for one predicted feature vector, the accuracy is only 1 or 0. Why is there a correct rate?

ShuntaroAoki commented 7 years ago

Sorry for late response.

Perhaps you're thinking that we ran identification from multiple categories (i.e., identifying a true category from all false categories). Actually, here we did pair-wise identification; we identified the true category from one false category, and repeated the identification for all true-false category pairs.

Each sample had a true category and 15371 false categories (49 other test + 15322 candidate categories; numCandidate). For each sample, we calculated correlation between the predicted and the true features (say, Rtrue) and between the predicted and each false features (say, [Rfalse_1, ..., Rfalse_15371]). These correlations are given in simmat in the code. Then, we compared Rtrue and each Rfalse_n (i.e., we comapred the correaltions in 15371 true-false category pairs). If Rtrue > Rfalse_n, then our model could correctly identify the true image category from n-th false category. Else, our model failed to identify the category in the true-false category pair. So, by taking Rtrue > [Rfalse_1, ..., Rfalse_15371], we had a vector of identification accuracy (1 or 0) with length of 15371. Finally, we took the mean of the accuracy vector as correct rate for the sample (a row in cr) (line 34-44 in pwidentification.m does this for all 50 samples).

Please take a look at the original paper (Horikawa &Kamitani, 2017, Nat Com) for more details.

zzstefan commented 7 years ago

Hi, thanks for your detailed interpretation. I have fully understand how the correct rate is calculated.

By the way, is there any chance you can provide the script of making the image features using CNN models and the other three models?

ShuntaroAoki commented 7 years ago

We are now prepareing scripts for calculating image features (CNN etc). I'll make them available publicly in the next couple of weeks.