Open GoogleCodeExporter opened 8 years ago
Hi Patrick,
What is your dataset size and what is your parameter for -rank_ratio?
For now, Cholesky Factorization is serial because it usually works on
a smaller matrix. During our experiment on RCV 800k dataset, we set rank_ratio
to 0.001 so that the matrix CF works on is 800*800. I suspect you set
rank_ratio to
a large value which may cause bad speedup. You could decrease rank_ratio and
try.
In fact, we used to consider Parallel Cholesky Factorization, but it will be
even slower on distributed computers because it requires much communication.
For most problems, the matrix CF works on is set to be small through rank_ratio.
Original comment by baihong...@gmail.com
on 19 Aug 2008 at 8:22
Thanks,
That seems to drastically help the speedup. One quick question...I noticed that
the resulting treshold/bias for my training data set seems to change with
different
rank_ratio parameters. My naive impulse is to assume that this is bad. Is this
true?
Pat
Original comment by PatJNichols@gmail.com
on 23 Aug 2008 at 10:34
Because for Interior Point Method, we have to do approximation to make it
solvable.
-rank_ratio is to control this approximation. Generally, the larger the
rank_ratio
is, the better the result is. But we have to trade off between time and
accuracy.
Make #number_of_data * #rank_ratio =1000 will be generally enough.
Original comment by baihong...@gmail.com
on 24 Aug 2008 at 7:31
Original issue reported on code.google.com by
PatJNichols@gmail.com
on 5 Aug 2008 at 5:44