how to use dbn - Githubissues

lyzhangjm commented 8 years ago

Sebelino commented 8 years ago

I haven't really implemented feature extraction using DBN, so you may want to add some code of your own for that. I use a DBN to extract meta-features from an existing feature space which I then use in hope of increasing the classification accuracy. If you are interested you can find the (still incomplete) details on page 23 of my thesis draft. The details are a little hairy since I also perform a form of feature selection, but it would basically look like this:

>> score('load 200002 | segment 3 | extract | organize dbn 20 3 | partition 3:1 | svm linear | eval')

If you'd like to incorporate regular feature extraction with DBN into the program, I can give a few pointers on where to look:

Segment.m, lines 14-22: Here is where the current seven features are defined.
dbnify.m: Here is where I use a DBN library to extract meta-features from an existing feature space. You'd probably want to copy some code from here.
lib/DBNToolbox/lib/: This is the library made by Wulsin that I use for working with DBNs. Depending on exactly what you are trying to do, you may want to study its source and documentation carefully. I will admit that I am not very familiar with it myself.

lyzhangjm commented 8 years ago

Thanks, score('load 200002 | segment 3 | extract | organize dbn 20 3 | partition 3:1 | svm linear | eval') is ok. But the classification accuracy is low too. ans = trainingset: [2428x1 LabeledFeaturevector] testingset: [809x1 LabeledFeaturevector] svm: [1x1 SVM] predictedset: [809x1 LabeledFeaturevector] accuracy: 0.6527 confusionmatrix: [5x5 double] confusionorder: [5x1 char] if I want to improve the classification accuracy ， what should I to do? Increasing training samples ?

lyzhangjm commented 8 years ago

I modify your code to increase training samples.The classification accuracy, however, is very low. ans = trainingset: [4867x1 LabeledFeaturevector] testingset: [1622x1 LabeledFeaturevector] svm: [1x1 SVM] predictedset: [1622x1 LabeledFeaturevector] accuracy: 0.4877 confusionmatrix: [5x5 double] confusionorder: [5x1 char]

the changed code: function [record,eeg,labels] = readrecord(spec) % Reads the record specified by the supplied parameter. datadir = 'data/'; records = { 'slp01a/slp01a' 'shhs/shhs1-200001'
'shhs/shhs1-200002' 'shhs/shhs1-200003' % 'shhs/shhs1-200004' % 'shhs/shhs1-200005' % 'shhs/shhs1-200006' % 'shhs/shhs1-200007' % 'shhs/shhs1-200008' % 'shhs/shhs1-200009' % 'shhs/shhs1-200010' }; % TODO cache this data matches = strfind(records,spec); matchindices = find(cellfun(@(y)~isempty(y),matches)); record = records{1}; if length(matchindices) > 0 record = records{matchindices(1)}; else error(['Found no record that matches input "',spec,'".']) end cachepath = cachepath(record); if exist(cachepath) disp(['Reading ',cachepath,'...']) data = load(cachepath,'eeg','labels'); eeg = data.eeg; labels = data.labels; else % recordpath = [datadir,record]; eeg = []; labels = []; for i=2:length(matchindices) record = records{i}; recordpath = [datadir,record]; disp(['Reading ',recordpath,'...']) [eg,lbl] = readsignal(recordpath); if i == 2 eeg = eg; labels = lbl; else eeg.Graph = [eeg.Graph;eg.Graph ]; % 200001-200003 are used to train model labels = [labels;lbl]; end end disp(['Reading ',recordpath,'...']) % [eeg,labels] = readsignal(recordpath); save(cachepath,'eeg','labels'); end end

Maybe the seven features are not enough. I don't understand your code completely. Could you increase the feature space? The reference [8] of your thesis used 28 features. Could you expand the feature space ?

Sebelino commented 8 years ago

Yeah, the added meta-features generated from the DBN unfortunately do not increase the accuracy significantly. The whole research question of my thesis is about investigating if it does, and I concluded that it does not. Then again, I only tested this approach with a very specific DBN setup (a 7-20-3 layer topology), so my results are pretty inconclusive. Here are some things you could try to increase the accuracy:

Try a different DBN topology, e.g. with more layers and/or different number of nodes.
My organize dbn 20 3 filter increases the dimensionality of the feature space from 7 to 10 (because three meta-features are extracted which are then appended to each feature vector). This may not be the best approach. Perhaps it is better to discard the original features instead, i.e., reduce the dimensionality from 7 to 3 in this case.
Try different initial biases of the hidden units by editing line 498 in DBNToolbox/lib/NNLayer.m. I set them to -4 for my evaluations by changing that line to obj.hidBiases = -4*ones(1,obj.numHid);. Längkvist did the same.
Try feature selection with restricted search. Although slow, it increased the accuracy by 10 % or so. The syntax is a bit different; you'd need to type something like select restricted svm linear | eval. There is an example in evaler.m, which I used for generating my results.

For more ideas, have a look at my Discussion and Future work sections of my thesis. Please take my conclusions with a grain of salt though; I haven't even defended my thesis yet.

Sebelino commented 8 years ago

Yes, increasing the number of features would probably help as well. If you are interested in doing so, you'd probably want to edit lines 14-22 in Segment.m.

Sebelino / hypnoscorer

how to use dbn #3