Open GoogleCodeExporter opened 9 years ago
Hi
i actually have an issue with the data distribution.
the distribution of examples in training and test is assumed to be similar
which it doesn't seem like in your case.
classwt is usually useful if you are trying to get misclassification down for
one class like in classifying a cancer class with a non-cancer class where
false negative is way worse than false positive. whereas cutoff is useful if
you want to tweak the probability of the classes.
tweaking via cutoff or classwt may be useful if you have skewed training
distribution, but in your case you have a skewed test distribution.
i think you should look into those cutoff/classwt only if you are getting bad
test error rate (not the absolute accuracy but like precision/recall or some
way to normalize the test error based on how many examples are there in each
class) maybe 300/300 splits are good enough for a very good test accuracy.
Original comment by abhirana
on 17 Jan 2013 at 5:48
Yes. For test accuracy ,it may be good. An extreme case is an classifier
predict all my test sample to '-1', then the accuracy will be very high. So do
I need to use some sampling method to have a skewed training dataset?
Original comment by zhangleu...@gmail.com
on 17 Jan 2013 at 5:54
sure a classifier can predict all test samples as -1, but then in cases like
this instead of using accuracy you can use precision recall or a normalized
accuracy based on the number of examples in each class and that will reflect
the skewness in the test data.
are you saying that you want to sample your training example so that RF samples
your training data like your test examples? you can tweak cutoff to change the
probabilities but i will strongly advise against that before checking whether
your existing data is not already giving you nice precision/recall values.
Original comment by abhirana
on 17 Jan 2013 at 6:00
Yes, I mean that. But if the training data is balanced , then there are many
FP(false positive) for the testing data, which means I will get a poor
specificity score.
Original comment by zhangleu...@gmail.com
on 17 Jan 2013 at 6:04
yeh both classwt and cutoff can be used
classwt can be tweaked too so that you make sure that the smaller class is
always fully classified as much as possible. mispredicting smaller class has a
higher penalty
you can also try changing cutoff. i am guessing you will be evaluating the
normalized accuracy? (accuracy on class 1 + accuracy on class 2) / 2
Original comment by abhirana
on 17 Jan 2013 at 6:27
No, I am evaluating the MCC
coefficient.http://en.wikipedia.org/wiki/Matthews_correlation_coefficient.
Very difficult to improve. The Mcc shows that the normal RF is only better than
random guess.
Original comment by zhangleu...@gmail.com
on 17 Jan 2013 at 6:33
hmm, cool. good to know something new :)
do tell me if you make headways using cutoff/classwt; your problem space is
hard but very relevant.
Original comment by abhirana
on 17 Jan 2013 at 6:40
hmm. If I try to modify the classwt/cutoff. The MCC will get improvement a
little. But what puzzle me is that there will be less positive lables after the
prediction.
Original comment by zhangleu...@gmail.com
on 17 Jan 2013 at 6:44
that is true, there is no free lunch.
Original comment by abhirana
on 17 Jan 2013 at 6:52
Hi,abhirana.
I still remember there is a RF package which can calculate the proximity
between the training and testing data.like something like:
extra_options.proximity=1.
model=classRF_train(X_trn,Y_trn,ntree,mtry,extra_options,X_tst,Y_tst);
But I can not find this package on you webpage. Is this stillavailable?
Original comment by zhangleu...@gmail.com
on 18 Jan 2013 at 2:18
its still there, i think you will have to sync with the source ( i might have
generated a package in one of the issues, if you want a precompiled package
just tell?)
tutorial file
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_Proximity_training_test.m
i think you asked me about it before here
http://code.google.com/p/randomforest-matlab/issues/detail?id=44#c15
Original comment by abhirana
on 21 Jan 2013 at 5:10
Yes. I want to have a precompiled package for 64bit windows. I need the version
that can calculate the proximity between the training and testing dataset.
I aklso found the latest package can not handle this issue. Maybe removed
because it needs lots of memory?
Thanks.
Original comment by zhangleu...@gmail.com
on 21 Jan 2013 at 8:02
if you sync to the SVN source, you will also get the latest compiled mex files
for both 32 bits and 64 bits
i just synced and the tutorial_Proximity_training_test.m works with that code.
yup, the proximity matrices will require space of about Ntrn^2*sizeof(double) +
Ntst^2*sizeof(double) i guess you are passing too many examples perhaps?
Original comment by abhirana
on 22 Jan 2013 at 9:58
Hi,abhirana,
I am still not sure about the effect of ' classwt' and ' cutofff'. I guess
'cutoff' only take effect at the final stage of Random Forest. i.e., for '-1'
class we have 200 votes and 300 for the '+1' class, so the final result is '+1'
if 'cutoff' is set by default. If we set 'cut off' as [ 3/4, 1/4], which means
the first class needs much less vote to win. Is that true?
And how ahout the classwt?? what effect does it have on the Random Forest?
Original comment by zhangleu...@gmail.com
on 31 Jan 2013 at 6:35
yeh, cutoff will behave as you mentioned.
classwt does things differently as internally during *training* instead of
assigning misclassification penalty to be the same among classes, the forest
will try to reduce misclassification of the class whose penalty is higher. i
used it in cases where getting a true positive about a class was way more
important then getting a false positive.
Original comment by abhirana
on 3 Feb 2013 at 10:31
you mean uring the training , RF use classwt to assigning misclassification
penalty?
But as far as I know, RF trains a lot of CART. Each cart will be filly grown without any prune and in each node it tries to find a best split variable using some criteria ( such as Gini, info gain, etc.). which step dose this 'penalty ' take place?
Original comment by zhangleu...@gmail.com
on 3 Feb 2013 at 2:06
i might be incorrect in saying it.
classwt to used to influence where the split is made
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/src/
classRF.cpp#371
it changes the number of examples present in each of the classes (by changing
tclasspop) and that will influence the split later on.
and even though the carts are fully grown without pruning it may not mean that
all the training examples are totally classified (its because the tree size is
dictated by nodesize/nrnodes and though nodesize is set to default at 1 for
classification, nrnodes influences the depth of the forest and i seen it
restricting trees to not go to a large depth and preventing it to classify all
training data (training_data ~=0)) if you want to check thats happening to your
data you can probably look at the examples which were inbag and see the labels
assigned for them.
Original comment by abhirana
on 3 Feb 2013 at 6:29
Can I understand it as following:
For example, at node t ,we all have 100 samples (50 '+1' and 50 '-1')to split
say if we want to make the split rule be Gini purety, which means
I(t)=Sigma[p(j/t)*p(i/t)],where t is the node and i~=j,and the accumulation
will search all different i and j. If we do not use classwt in this case, then
p(-1/t)=p(1/t)=1/2.
But if we set classwt=[1,3], then it seems to be we will still have 50 '-1' and
150 '+1' then p(-1/t)=1/4 and p(1/t)=3/4???
Is that true?
Original comment by zhangleu...@gmail.com
on 4 Feb 2013 at 2:07
i guess that is how it will work. i dont remember if classwt was direct like
you said or inverse, but yeah the probabilities will be skewed
Original comment by abhirana
on 4 Feb 2013 at 9:45
Ok. Thanks .
By the way, do you knw why Random Forest do not need to prune each CART?
I think we do need to prune the tree in CART algorithm. But when it comes to
RF, why it become unnecessary?
Original comment by zhangleu...@gmail.com
on 4 Feb 2013 at 1:34
RF trees are more unstable than bagging trees and (due to mtry<<D) and each RF
tree is different from other RF trees due to bootstrapping (in turn bagging
trees from CART trees)
i think there is an argument that pruning is required to reduce the overfit (by
reducing the bias of the trees) but as RF trees primarily have a low bias (due
to inclusion of mtry<<D) and slightly high variance and RF forest is a low bias
and low variance classifier (due to the properties of the ensemble trees),
pruning the tree wouldn't reduce it further and maybe even increase the
variance (probability of two small trees to give the very similar answer/or
have the same splits is higher than of a larger version of those trees).
empirically pruning RF trees doesnt seem to help too.
Original comment by abhirana
on 4 Feb 2013 at 7:59
That should be the reason. Thanks a lot.
Original comment by zhangleu...@gmail.com
on 5 Feb 2013 at 1:26
Hi Zhang
Do you know if the relationship of classwt and population is direct or inverse?
I mean if 10% of the samples are class (-1) and 90% of the samples are class
(+1) and the missclassification of class (-1) is more expensive, how do you
create the classwt vector to input the prior knowledge of population into
random forest? do you create the classwt as [0.1 0.9] corresponding to labels
[-1 1] or do you create it as [0.9 0.1]?
Thanks
Original comment by m.saleh....@gmail.com
on 23 Jul 2013 at 6:44
Hi, it has been a long time since I touch about the RF issues. I think in your
case it should be [0.9 0.1]. Also you can check and look into src/rfutils.cpp
in normClassWt() function.
Original comment by zhangleu...@gmail.com
on 24 Jul 2013 at 1:22
Hi,
I have a balanced training set and the cross validation error is very low. But
my test data is heavily skewed as mentioned by Zhang previously. There's only 1
+ve among 9216 values. I know this is very very heavily skewed. But the nature
of data is such in my case. So, what should be the values of class wt and
cutoff? (Note that my training set is balanced!)
As the labels are [-1 1] should the class wt be [.9 .1]? and cut off be [.1
.9]? (From the discussion between Abhirana and Zhang, this is what I
understood. Please correct me.)
Original comment by sharathc...@gmail.com
on 31 Oct 2013 at 6:26
@sharathchandra92
i am a bit confused. training and test set should ideally have similar class
probabilities, which is not in your case. what is your end goal for a test?
making sure that 1+ve is always classified correctly (like +ve has higher
misclassification cost) or is it high accuracy ?
anyways,
you should try one of them at a time before trying both of them simultaneously.
you can look at individual class oob error for the effects. classwt will make
RF train harder on the class (i'll start with this) whereas cutoff just tunes
the proportions of votes required to win.
for both classwt and cutoff vectors, higher values means that its easier for
the corresponding class to win compared to the other classes.
Original comment by abhirana
on 31 Oct 2013 at 7:47
Yeah actually they should, but in this case, the problem is slightly tricky in
its formulation.
My goal is to minimize any false positives and make sure that +ve is correctly
classified. Essentially, I have only 1 +ve out of 9216 samples in my test, so I
cannot miss this +ve and at the same time, some false positives are ok (but not
more than 5-6)! I am aiming for higher accuracy on the test data, which means
+ve has higher misclassification cost in training. Am I correct on this?
I have seen the oob error rates, the figures are attached:
Figure 1
extra_options.classwt=[.05 .95];
extra_options.cutoff=[.01 .99];
model = classRF_train(X_trn,Y_trn, 1000, 10, extra_options);
Figure 2
extra_options.classwt=[.01 .99];
extra_options.cutoff=[.01 .99];
model = classRF_train(X_trn,Y_trn,2000,7,extra_options);
Figure 4
extra_options.classwt=[.1 .9];
extra_options.cutoff=[.01 .99];
model = classRF_train(X_trn,Y_trn,2500,100,extra_options);
Original comment by sharathc...@gmail.com
on 1 Nov 2013 at 12:00
Attachments:
You should probably try to tune classwt (first and foremost). cutoff tends to
cause too much variation (lets say you set cutoff to 10 - 90, that means that
-1 will require 9 times more votes to win compared to +1, whereas 50-50 means
that -1 will require equal amounts of votes to win compared to +1). just want
to make sure you can decouple the effects of both the factors (classwt-cutoff).
also try to see the per class ooberr rather than overall ooberr (the other
colmns of the ooberr give the per class ooberr). and compare that per class
ooberr (with and without classwt). You could plot various values of classwt
proportions and change in per class ooberr
Original comment by abhirana
on 1 Nov 2013 at 12:13
Yes. I have conducted experiments with multiple settings of classwt and cutoff,
both individually and together.
There are seemingly contradictory observations!
For Class 2:
Classwts:
.2 .8 - Error is going down till .0125
.4 .6 - Down Till 0.0125
.05 .95 - Shooting up till 0.9
.7 .3 - Going down till .01
So, it is kind of confusing which value to take!
CutOff
.7 .3 - Down till 0.005
.3 .7 - Down till 0.025
.1 .9 - Down till 0.08
Coupling both
Classwt and CutOff
.001 .999 .999 .001 - Down to 0.
.1 .9 .1 .9 - Down to .09
.3 .7 .3 .7 - Down to .01
.05 .95 .01 .99 - Shooting up to .55
.999 .001 .001 .999 - Shooting up to .75
It is a little ambiguous which setting to go for! My concern is like this: I am
trying to predict which point in an image is the keypoint and the image has
9216 pixels and 1 of them is the +ve point in test data! So by any chance, I
should get this right. At the same time, there is a possibility that certain
neighbouring points will have similar features and might be tagged as keypoint,
which is Ok. Under these circumstances, Should I just see which setting gives
me lowest class2 error?
I have attached the graphs in rar file. Sorry wherever it says class3 - that's
class 2 actually (typo).
Another question I had was, can this implementation do multi class regression?
Original comment by sharathc...@gmail.com
on 1 Nov 2013 at 5:49
Attachments:
ok. i would use only classwt. i see too much variation in examples using cutoff
(either as a single factor or when coupling). For those examples, its unnatural
to see a single (or <10 trees) having low ooberr, then the ooberr increasing
between 10-100 trees and then decreasing (ooberr ideally should decrease
monotonously).
can i ask what is the current results (can it correctly classify the single +ve
example ) on the test dataset without using either cutoff and classwt. can i
ask you what the final goal is, to get a good ooberr or a good tsterr?
yes, it can do multi class classification.
Original comment by abhirana
on 2 Nov 2013 at 12:12
Yes, that is true, ooberr is fluctuating heavily when cutoff is used.
Currently without using either cutoff and classwt, I am getting lot of false
positives. The final goal is to get good testerr. The basic issue with the
skewed nature of test set. I am trying to see if I can do something about
decreasing it by changing the features.
And I was curious if RF can do multi-output regression. I guess you misread it
as classification. Can you please clarify?
Original comment by sharathc...@gmail.com
on 2 Nov 2013 at 12:34
this RF package cannot perform multi-output regression.
Original comment by abhirana
on 2 Nov 2013 at 6:56
Original issue reported on code.google.com by
zhangleu...@gmail.com
on 16 Jan 2013 at 8:47