covartech / PRT

Pattern Recognition Toolbox for MATLAB
http://covartech.github.io/
MIT License
145 stars 70 forks source link

M-ary dataSet in feature selection #35

Closed anaritam closed 9 years ago

anaritam commented 9 years ago

Hi,

Is there a way to use feature selection with a dataSet with 3 classes (besides prtFeatSelStatic) ?

I'm using a DataSet with 7 features and 3 classes, I would like my code to choose from all 7 features, the ones that work better.

Thanks, Ana

peterTorrione commented 9 years ago

Hi Ana,

Something like the following should help:

ds1 = prtDataGenMarysSimpleSixClass; ds2 = prtDataGenMarysSimpleSixClass; dsTotal = catFeatures(ds1,ds2); %total of 4 features, 6 classes nFolds = 3;

% Find the best 3 features using a KNN classifier: knn = prtClassKnn; featSel = prtFeatSelSfs('nFeatures',3,'evaluationMetric',@(ds)prtEvalPercentCorrect(knn,ds,nFolds)); featSel = featSel.train(dsTotal);

dsSelected = featSel.run(dsTotal); plot(dsSelected) %Has the best 3 features!

anaritam commented 9 years ago

Hi, Can't use what you said in my data :x This is what I have:

dataSet = prtDataSetClass(features_train,labels_train); nStdRemove = prtOutlierRemovalNStd('runMode','removeObservation'); nStdRemove = nStdRemove.train(dataSet); dataSetNew = nStdRemove.run(dataSet);

featSel = prtFeatSelSfs; % Create a feature selction object featSel.nFeatures = 3; % Select only one feature of the data featSel = featSel.train(dataSetNew); % Train the feature selection object outDataSet = featSel.run(dataSetNew);

features_train is a nSamples x 7 matrix and labels_train is a nSamples x 1 matrix

My code can't run the last code line and says

"Error using prtClass/determineMaryOutput (line 310) M-ary classification is not supported by this classifier. You will need to use prtClassBinaryToMaryOneVsAll() or an equivalent M-ary emulation classifier."

peterTorrione commented 9 years ago

Hello,

I think you need to do two things:

1) Specify a classifier that can handle M-ary data (e.g., prtClassKnn) 2) Specify an evaluation that scores multi-class outputs (e.g., prtEvalPercentCorrect)

For example:

knn = prtClassKnn; featSel = prtFeatSelSfs('nFeatures',3,'evaluationMetric',@(ds)prtEvalPercentCorrect(knn,ds,nFolds)); featSel = featSel.train(dsTotal);

anaritam commented 9 years ago

Ok I manage to do this. I used

featSel = prtFeatSelSfs('nFeatures',nFeatures_used,'evaluationMetric',@(ds)prtEvalPercentCorrect(prtClassMap,ds)); featSel = featSel.train(dataSet); outDataSet = featSel.run(dataSet);

my question now is: I can use this outDataSet like this: classifier_7 = prtClassMap+ prtDecisionMap; classifier_7 = classifier_7.train(outDataSet); % Train classified_7 = run(classifier_7, dataSet_test);

in order to test the classifier?

peterTorrione commented 9 years ago

Hi,

You need to also run:

OutDataSet_test = featSel.run(dataSet_test); [...] classified_7 = run(classifier_7, OutDataSet_test );

To apply the feature selection to your test dataset, otherwise the two data sets will have different numbers of features.

-Pete

anaritam commented 9 years ago

What I did was actually this

selectedFeatures = featSel.selectedFeatures; dataSet_test=retainFeatures(dataSet,selectedFeatures);

classifier_7 = prtClassMap+ prtDecisionMap; classifier_7 = classifier_7.train(outDataSet); % Train classified_7 = run(classifier_7, dataSet_test);

It's the same thing right?

peterTorrione commented 9 years ago

Yes, that looks right.

anaritam commented 9 years ago

I keep having this error

Error using prtRvMvn/logPdf (line 184) SIGMA must be symmetric and positive definite;

Error in prtRv/runAction (line 249) DataSet = DataSet.setObservations(Obj.logPdf(DataSet));

Error in prtAction/run (line 250) dsOut = runAction(self, dsOut);

Error in prtClassMap/runAction (line 119) logLikelihoods(:,iY) = getObservations(run(self.rvs(iY), ds));

Error in prtAction/run (line 250) dsOut = runAction(self, dsOut);

Error in prtAction/crossValidate (line 369) outputDataSetCell{uInd} = trainedAction.run(testDs);

Error in prtAction/kfolds (line 553) [outputs{:}] = self.crossValidate(ds,keys);

Error in prtUtilEvalParseAndRun (line 35) Results = classifier.kfolds(dataSet,nFolds);

Error in prtEvalPercentCorrect (line 58) results = prtUtilEvalParseAndRun(classifier,dataSet,nFolds);

Error in @(ds)prtEvalPercentCorrect(prtClassMap,ds)

Error in prtFeatSelSfs/trainAction (line 149) cPerformance(i) = Obj.evaluationMetric(tempDataSet);

Error in prtAction/train (line 221) self = trainAction(self, ds);

whenever I try to use more than 2 features in the prtFeatSelSfs function. Don't understand it so I can't solve it...

Thanks for you help, Ana

peterTorrione commented 9 years ago

Hello,

This is technically a new issue, so please start a new issue for additional comments. But it sounds like your features are not linearly independent, or you have too few observations for at least one class in your data set.

prtRvMvn is trying to learn a covariance matrix from your data - e.g.,

cov(X(Y == 1,:))

And the result of this needs to be positive semi-definite, or it's impossible to learn a Multi-Variate Normal Gaussian variable...

You might try using a simpler classifier - e.g., KNN, which does not require a full-rank covariance matrix...