ezmissu / sofia-ml

Automatically exported from code.google.com/p/sofia-ml
0 stars 0 forks source link

Issues with dimensionality off-by-one #10

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create this training file:

======= train.txt  =======
1 1:1 2:.1 3:.1 200:1                                                           

1 1:1.2 2:.01 3:.01 200:1                                                       

1 1:3 2:.2 3:.41 200:1                                                          

-1 3:4 200:1                                                                    

-1 2:3 200:1                                                                    

-1 1:.1 2:3 3:2 200:1        
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic 
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt 
--model_out debug-model.txt                                                     

3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

The the model should spit out 201 terms, the first being the bias term. Instead 
it spits out 200, and clips off the last weight. When I set dimensionality to 
201, I get what I would expect:

0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0.263645  

This was compiled from source a couple weeks ago. The program should probably 
crash if you say dimensionality is 200 and there is a "200:x" term in the 
sparse vector representation, unless the no-bias flag is set.

Original issue reported on code.google.com by justi...@gmail.com on 26 Feb 2013 at 3:24

GoogleCodeExporter commented 9 years ago
When you set dimensionality 200 it also includes the label, thus sofia expect 1 
label and 199 features.  So in your case dimensionality should indeed be 201. I 
agree it's not very convenient and must be confusing at first sight.

Original comment by zhani...@myglam.com on 7 May 2013 at 6:47