ai-se / HDP

if exists
4 stars 1 forks source link

Small data set #9

Open WeiFoo opened 9 years ago

WeiFoo commented 9 years ago

During the phase of KS test, considering the following experimental condition:

WeiFoo commented 9 years ago

Exp1

Note:

Target       | HDP-JC | HDP-Scipy | N-50  | N-100 | N-150 | N-200
ar3          | 0.823  | 0.842     | 0.812 | 0.83  | 0.827 | 0.836
skarbonka    | 0.694  | 0.669     | 0.5   | 0.698 | 0.682 | 0.669
poi-1.5      | 0.701  | 0.712     | 0.577 | 0.657 | 0.645 | 0.686
arc          | 0.701  | 0.67      | 0.609 | 0.671 | 0.669 | 0.671
velocity-1.4 | 0.391  | 0.432     | 0.45  | 0.451 | 0.476 | 0.451
JDT          | 0.767  | 0.702     | 0.559 | 0.639 | 0.671 | 0.688
LC           | 0.655  | 0.647     | 0.56  | 0.65  | 0.54  | 0.653
zxing        | 0.65   | 0.624     | 0.532 | 0.635 | 0.615 | 0.621
xalan-2.4    | 0.751  | 0.59      | 0.561 | 0.708 | 0.66  | 0.666
redaktor     | 0.537  | 0.508     | 0.502 | 0.5   | 0.524 | 0.524
cm1          | 0.717  | 0.633     | 0.603 | 0.633 | 0.631 | 0.633
ar1          | 0.734  | 0.704     | 0.512 | 0.621 | 0.584 | 0.702
camel-1.0    | 0.639  | 0.655     | 0.627 | 0.653 | 0.566 | 0.655
ar4          | 0.816  | 0.798     | 0.619 | 0.53  | 0.797 | 0.766
ar5          | 0.911  | 0.897     | 0.451 | 0.656 | 0.683 | 0.84 
ar6          | 0.64   | 0.626     | 0.523 | 0.528 | 0.558 | 0.581
safe         | 0.818  | 0.772     | 0.538 | 0.514 | 0.719 | 0.604
apache       | 0.717  | 0.71      | 0.593 | 0.698 | 0.709 | 0.691
PDE          | 0.717  | 0.665     | 0.619 | 0.615 | 0.634 | 0.636
EQ           | 0.783  | 0.716     | 0.62  | 0.767 | 0.669 | 0.694
mw1          | 0.727  | 0.71      | 0.5   | 0.553 | 0.587 | 0.704
tomcat       | 0.818  | 0.716     | 0.519 | 0.758 | 0.658 | 0.763
ML           | 0.692  | 0.635     | 0.515 | 0.636 | 0.657 | 0.615
pc1          | 0.752  | 0.671     | 0.669 | 0.683 | 0.671 | 0.671
pc4          | 0.682  | 0.694     | 0.636 | 0.617 | 0.696 | 0.696
ant-1.3      | 0.835  | 0.726     | 0.702 | 0.736 | 0.714 | 0.729
pc3          | 0.738  | 0.496     | 0.643 | 0.524 | 0.572 | 0.635
xerces-1.2   | 0.489  | 0.489     | 0.51  | 0.485 | 0.485 | 0.49 
WeiFoo commented 9 years ago

Difference between HDP-JC and HDP-Scipy

de69759e-cf6d-434c-b300-19d6fe96dcd5

Difference between N-50 and HDP-Scipy

788486da-8f38-45ab-8255-d23e60ad36ff

WeiFoo commented 9 years ago

Difference between N-100 and HDP-Scipy

9b7d2f2a-1cef-4492-b604-72982960e282

WeiFoo commented 9 years ago

Difference between N-150 and HDP-Scipy

1f1d6132-6c0a-4eb1-b3ba-18bd7d53c822

WeiFoo commented 9 years ago

Difference between N-200 and HDP-Scipy

f44eb3a4-fd27-494d-b33f-cbdad2072410

timm commented 9 years ago

So. 200 is definitely enough and 150 is arguably enough.

regarding the second point, previously i did this using LOC measures. you could do the same.

100 modules may take as little as two to four person months to construct. This estimate was generated as follows:

timm commented 9 years ago

I think we might get away with smaller targets if the source is larger.

so can we see the source= Large, and target=small results next?

WeiFoo commented 9 years ago

Exp2: source = large, target = small in KS

Note:

Target       | HDP-JC | HDP-Scipy | N-50  | N-100 | N-150 | N-200
ar3          | 0.823  | 0.842     | 0.839 | 0.842 | 0.842 | 0.842
skarbonka    | 0.694  | 0.669     | 0.669 | 0.669 | 0.669 | 0.669
poi-1.5      | 0.701  | 0.712     | 0.677 | 0.651 | 0.704 | 0.712
arc          | 0.701  | 0.67      | 0.677 | 0.673 | 0.671 | 0.67 
velocity-1.4 | 0.391  | 0.432     | 0.451 | 0.451 | 0.451 | 0.432
JDT          | 0.767  | 0.702     | 0.691 | 0.669 | 0.671 | 0.653
LC           | 0.655  | 0.647     | 0.652 | 0.663 | 0.658 | 0.657
zxing        | 0.65   | 0.624     | 0.625 | 0.625 | 0.624 | 0.624
xalan-2.4    | 0.751  | 0.59      | 0.704 | 0.69  | 0.686 | 0.682
redaktor     | 0.537  | 0.508     | 0.519 | 0.514 | 0.518 | 0.508
cm1          | 0.717  | 0.633     | 0.633 | 0.633 | 0.633 | 0.633
ar1          | 0.734  | 0.704     | 0.696 | 0.704 | 0.704 | 0.704
camel-1.0    | 0.639  | 0.655     | 0.655 | 0.663 | 0.655 | 0.663
ar4          | 0.816  | 0.798     | 0.794 | 0.798 | 0.798 | 0.798
ar5          | 0.911  | 0.897     | 0.897 | 0.897 | 0.897 | 0.897
ar6          | 0.64   | 0.626     | 0.613 | 0.613 | 0.626 | 0.626
safe         | 0.818  | 0.772     | 0.754 | 0.772 | 0.772 | 0.772
apache       | 0.717  | 0.71      | 0.692 | 0.715 | 0.705 | 0.71 
PDE          | 0.717  | 0.665     | 0.669 | 0.663 | 0.673 | 0.671
EQ           | 0.783  | 0.716     | 0.752 | 0.752 | 0.737 | 0.737
mw1          | 0.727  | 0.71      | 0.642 | 0.673 | 0.704 | 0.708
tomcat       | 0.818  | 0.716     | 0.737 | 0.748 | 0.763 | 0.754
ML           | 0.692  | 0.635     | 0.641 | 0.643 | 0.629 | 0.646
pc1          | 0.752  | 0.671     | 0.678 | 0.671 | 0.671 | 0.671
pc4          | 0.682  | 0.694     | 0.683 | 0.696 | 0.698 | 0.694
ant-1.3      | 0.835  | 0.726     | 0.732 | 0.723 | 0.726 | 0.726
pc3          | 0.738  | 0.496     | 0.658 | 0.642 | 0.639 | 0.635
xerces-1.2   | 0.489  | 0.489     | 0.482 | 0.485 | 0.488 | 0.489

Difference between N-50 and HDP-Scipy

83e66909-04aa-434a-8e22-2c11a05b8121

Difference between N-100 and HDP-Scipy

97e54360-4922-4a2f-bf84-4767d4583c8d

Difference between N-150 and HDP-Scipy

2835d857-0776-436a-88e5-0a7548b71ee6

Difference between N-200 and HDP-Scipy

6c68e8d1-8207-471a-be14-53614291609c

timm commented 9 years ago

summary, 50 examples from the target is enough to learn synonyms.

timmsay

WeiFoo commented 9 years ago

JULY 17 Updates

Exp2B: source = large, target = small in KS

WHY IQR = 0 for all data sets with HDP-Scipy?

Target       | WPDP  | HDP-JC | HDP-Scipy | HDP-Scipy-IQR | N-50  | N-50-IQR | N-100 | N-100-IQR | N-150 | N-150-IQR | N-200 | N-200-IQR
ar3          | 0.574 | 0.823  | 0.842     | 0.0           | 0.837 | 0.007    | 0.842 | 0.0       | 0.842 | 0.0       | 0.842 | 0.0      
skarbonka    | 0.569 | 0.694  | 0.669     | 0.0           | 0.669 | 0.0      | 0.669 | 0.0       | 0.669 | 0.0       | 0.669 | 0.0      
poi-1.5      | 0.707 | 0.701  | 0.712     | 0.0           | 0.658 | 0.059    | 0.66  | 0.035     | 0.686 | 0.03      | 0.686 | 0.051    
arc          | 0.67  | 0.701  | 0.67      | 0.0           | 0.677 | 0.01     | 0.671 | 0.019     | 0.671 | 0.008     | 0.671 | 0.0      
velocity-1.4 | 0.725 | 0.391  | 0.432     | 0.0           | 0.451 | 0.0      | 0.451 | 0.027     | 0.451 | 0.009     | 0.432 | 0.0      
JDT          | 0.795 | 0.767  | 0.702     | 0.0           | 0.699 | 0.039    | 0.684 | 0.037     | 0.674 | 0.025     | 0.671 | 0.013    
LC           | 0.575 | 0.655  | 0.647     | 0.0           | 0.652 | 0.007    | 0.657 | 0.01      | 0.657 | 0.004     | 0.654 | 0.008    
zxing        | 0.605 | 0.65   | 0.624     | 0.0           | 0.624 | 0.005    | 0.625 | 0.007     | 0.627 | 0.011     | 0.628 | 0.01     
xalan-2.4    | 0.755 | 0.751  | 0.59      | 0.0           | 0.693 | 0.015    | 0.679 | 0.017     | 0.682 | 0.021     | 0.67  | 0.041    
redaktor     | 0.744 | 0.537  | 0.508     | 0.0           | 0.519 | 0.012    | 0.514 | 0.011     | 0.516 | 0.005     | 0.508 | 0.0      
cm1          | 0.653 | 0.717  | 0.633     | 0.0           | 0.633 | 0.001    | 0.633 | 0.002     | 0.633 | 0.0       | 0.633 | 0.0      
ar1          | 0.582 | 0.734  | 0.704     | 0.0           | 0.706 | 0.032    | 0.708 | 0.02      | 0.704 | 0.0       | 0.704 | 0.0      
camel-1.0    | 0.55  | 0.639  | 0.655     | 0.0           | 0.655 | 0.012    | 0.657 | 0.016     | 0.655 | 0.016     | 0.655 | 0.0      
ar4          | 0.657 | 0.816  | 0.798     | 0.0           | 0.792 | 0.012    | 0.798 | 0.001     | 0.798 | 0.0       | 0.798 | 0.0      
ar5          | 0.804 | 0.911  | 0.897     | 0.0           | 0.897 | 0.0      | 0.897 | 0.0       | 0.897 | 0.0       | 0.897 | 0.0      
ar6          | 0.654 | 0.64   | 0.626     | 0.0           | 0.598 | 0.034    | 0.613 | 0.006     | 0.626 | 0.0       | 0.626 | 0.0      
safe         | 0.706 | 0.818  | 0.772     | 0.0           | 0.692 | 0.074    | 0.772 | 0.0       | 0.772 | 0.0       | 0.772 | 0.0      
apache       | 0.714 | 0.717  | 0.71      | 0.0           | 0.694 | 0.053    | 0.71  | 0.021     | 0.71  | 0.031     | 0.71  | 0.0      
PDE          | 0.684 | 0.717  | 0.665     | 0.0           | 0.667 | 0.01     | 0.666 | 0.01      | 0.671 | 0.004     | 0.671 | 0.008    
EQ           | 0.583 | 0.783  | 0.716     | 0.0           | 0.762 | 0.011    | 0.758 | 0.014     | 0.75  | 0.016     | 0.737 | 0.01     
mw1          | 0.612 | 0.727  | 0.71      | 0.0           | 0.681 | 0.053    | 0.687 | 0.039     | 0.704 | 0.03      | 0.708 | 0.002    
tomcat       | 0.778 | 0.818  | 0.716     | 0.0           | 0.737 | 0.037    | 0.746 | 0.016     | 0.763 | 0.018     | 0.756 | 0.017    
ML           | 0.734 | 0.692  | 0.635     | 0.0           | 0.64  | 0.026    | 0.639 | 0.025     | 0.637 | 0.026     | 0.641 | 0.022    
pc1          | 0.787 | 0.752  | 0.671     | 0.0           | 0.678 | 0.014    | 0.671 | 0.006     | 0.671 | 0.006     | 0.671 | 0.0      
pc4          | 0.9   | 0.682  | 0.694     | 0.0           | 0.69  | 0.013    | 0.696 | 0.004     | 0.696 | 0.003     | 0.695 | 0.004    
ant-1.3      | 0.609 | 0.835  | 0.726     | 0.0           | 0.732 | 0.018    | 0.729 | 0.015     | 0.726 | 0.0       | 0.726 | 0.0      
pc3          | 0.794 | 0.738  | 0.496     | 0.0           | 0.655 | 0.011    | 0.641 | 0.002     | 0.641 | 0.005     | 0.641 | 0.009    
xerces-1.2   | 0.624 | 0.489  | 0.489     | 0.0           | 0.482 | 0.004    | 0.485 | 0.007     | 0.489 | 0.002     | 0.489 | 0.002