Open ekg opened 8 years ago
So with a fiarly simple quadratic model once we have error rats under 2% on the open dataset with window size of 16, initial ngram experiments failed to learn, but the parameter spaceis reaching limit, so worth reruning those experiments with full data.
freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha -c -b 26 -f ~/nongram_robot1.model --compressed
on testing: freebayes@ip-10-0-0-6:~$ vw -t -d ~/XPrize_Illumina_WG.open_w16.hhga.gz --binary -i ~/nongram_robot1.model
loss of 0.013290
freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha --ngram a5 -c -b 26 -f ~/ngram5_robot1.model --compressed │·
does well,
Once we understood the theory, twopass quadratic generalizes [[edit this was wrong, info was leaking from future]]
freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 2 -c -q ah -b 18 -f ~/twopass_robot1.model --compressed │·
[[edit this was wrong, info was leaking from future]] │· finished run │· number of examples per pass = 3691989 │· passes used = 2 │· weighted example sum = 7383978.000000 │· weighted label sum = 3907430.000000 │· average loss = 0.006821 h │· best constant = 0.529177 │· best constant's loss = 0.719972 │· total feature number = 235890300812 │· │· real 32m14.351s │· user 49m8.946s │· sys 0m18.625s │· freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model
│·
finished run │· number of examples per pass = 482196 │· passes used = 1 │· weighted example sum = 482196.000000 │· weighted label sum = -230530.000000 │· average loss = 0.015220 │· best constant = -0.478084 │· best constant's loss = 0.771436 │· total feature number = 14165571258 │· freebayes@ip-10-0-0-6:~$ vw -d ~/XPrize_Illumina_WG.open_w16.hhga.gz --binary -t -i ~/twostep.model
vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --ngram h5 --nn 100 -c -b 25 -f ~/ngram5hnn100_robot1.model --compressed
vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --nn 1000 -c -b 25 -f ~/nn1000_robot1.model --compressed
vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/b18passes1_robot1.model --compressed
Trained models on full robot1 from garvan.
Tested against the second garvan set (robot2).
0.002364 twostep25.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -f ~/twostep25.model
0.002695 twostep25redo.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --cache_file ~/NA12878_V2.5_Robot_1.1.hhga.gz.train.cache -b 25 --ngram h3 -f ~/twostep25redo.model
0.003785 b25qhap20.model name=b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.004464 ngramh3a3b18interactass.model name=ngramh3a3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.004464 ngramh3b18interactass.model name=ngramh3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005045 ngramh3b18qha.model name=ngramh3b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005074 ngramh3b18passes1_robot1.model vw --ngram 3h -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3b18passes1_robot1.model -c
0.005121 twostep.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model
0.005248 ngramh3a3b16interactass.model name=ngramh3a3b16interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005248 ngramh3a3b16qhaqas.model name=ngramh3a3b16qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005402 ngramh3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3a3b18passes1_robot1.model -c
0.005629 ngramh3b25qhaqas.model name=ngramh3b25qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 25 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005818 ngramh3a3b20qhaqas.model name=ngramh3a3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005818 ngramh3b20qhaqas.model name=ngramh3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005840 ngramh3a3b16qhainteracths.model name=ngramh3a3b16qhainteracths.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions hs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005958 b25passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 25 -f ~/b25passes1_robot1.model --compressed
0.006112 ngramh3a3b14interactass.model name=ngramh3a3b14interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.006225 ngramh3a3b16qha.model name=ngramh3a3b16qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.006483 b22passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 22 -f ~/b22passes1_robot1.model --compressed
0.006887 b20passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 20 -f ~/b20passes1_robot1.model --compressed
0.007165 ngramh3a3b16interactahs.model name=ngramh3a3b16interactahs.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 --interactions ahs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.007219 ngramh3b15qha.model name=ngramh3b15qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.007825 b25qhap2.model name=b25qhap20model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.008095 b18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.008095 b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.008095 h3b18qhap1.model name=h3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/$name 2>&1 | tee ~/$name.log
0.008140 ngramh3b25qhap2randstart.model name=ngramh3b25qhap2randstart.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 2 --cache_file $name.cache --binary -b 25 -q ha --ngram h3 --random_weights 0 --loss_function hinge -f ~/$name 2>&1 | tee ~/$name.log
0.008140 q_ha__passes_2__b_25_randstart.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 25 --ngram h3 -f ~/q_ha__passes_2__b_25_randstart.model --random_weights 0 --loss_function hinge
0.008176 ngramh3a3b28q.model name=ngramh3a3b28q.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.008176 ngramh3b28pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 -f ~/ngramh3b28pass20.model 2>&1 | tee ~/ngramh3b28pass20.model.log
0.008330 ngramh3b25qha1pass.model name=ngramh3b25qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 25 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.008561 ngramh3b14qha.model name=ngramh3b14qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 14 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.008637 ngrama3b18qhap1.model name=ngrama3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.008816 ngramh3s1b28pass20.model name=ngramh3s1b28pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.009089 ngramh3b25l0.05pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -l 0.05 -f ~/ngramh3b25l0.05pass20.model 2>&1 | tee ~/ngramh3b25l0.05pass20.model.log
0.009274 ngramh2b18passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
0.009459 ngramh5b28pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h5 -f ~/ngramh5b28pass20.model 2>&1 | tee ~/ngramh5b28pass20.model.log
0.009462 ngramh3s1b25pass20.model name=ngramh3s1b25pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.009608 ngramh3b120qhap1.model name=ngramh3b120qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.009760 ngramh5b20qha.model name=ngramh5b20qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 20 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.009771 ngramh3qhab18passes1_robot1.model vw --ngram h3 -q ah -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qhab18passes1_robot1.model -c
0.009802 ngramh3b25qhap20.model name=ngramh3b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -c -q ha --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010099 ngramh2s1b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.010178 ngramh3b25p1logistic.model name=ngramh3b25p1logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 25 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.010240 ngramh3b18qhap1.1.model name=ngramh3b18qhap1.1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010240 ngramh3b18qhap1.model name=ngramh3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010240 ngramh3qb18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.010509 ngramh5b18qha.model name=ngramh5b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.010961 ngramh3b16qhap1.model name=ngramh3b16qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 16 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.011531 ngramh3b15qhap1.model name=ngramh3b15qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.012500 ngramh3b18p1qhalogisticnolink.model name=ngramh3b18p1qhalogisticnolink.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.013266 ngramh3a3b14interacthass.model name=ngramh3a3b14interacthass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions has -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.054843 ngrams2h3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 --skips 2 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams2h3a3b18passes1_robot1.model -c --random_weights 1
0.058047 ngrams1h3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 --skips 1 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams1h3a3b18passes1_robot1.model -c --random_weights 1
0.255162 b25qhap20logistic.model name=b25qhap20logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.255162 ngramh3b18p1qhalogistic.model name=ngramh3b18p1qhalogistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.744838 ngrama3b18qhap2.model name=ngrama3b18qhap2.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838 ngramh3b18lrqdrophap1.model name=ngramh3b18lrqdrophap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838 ngramh3b18qhap1stagepolye.model name=ngramh3b18qhap1stagepolye.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838 ngramh3b24p20qhann8inpassstagepoly.model name=ngramh3b24p20qhann8inpassstagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 24 --ngram h3 --nn 8 --inpass -c --compressed --passes 20 -f ~/$name 2>&1 | tee ~/$name.log
0.744838 ngramh3b24p20qhann8.model : 1460451078:0;name=ngramh3b24p20qhann8.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -b 24 --ngram h3 -q ha --nn 8 -c --passes 20 --cache_file ~/$name.cache -f ~/$name 2>&1 | tee ~/$name.log
And for xprize
0.014448 ngramh3b18passes1_robot1.model vw --ngram 3h -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3b18passes1_robot1.model -c
0.014575 b25qhap2.model name=b25qhap20model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.014712 b18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.014712 b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.014712 h3b18qhap1.model name=h3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/$name 2>&1 | tee ~/$name.log
0.014731 b20passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 20 -f ~/b20passes1_robot1.model --compressed
0.014932 ngrama3b18qhap1.model name=ngrama3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.015077 ngramh3b15qha.model name=ngramh3b15qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015220 twostep.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model
0.015533 ngramh2b18passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
0.015541 ngramh3qb18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.015554 ngramh3a3b16qha.model name=ngramh3a3b16qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.015639 ngramh3b23qha1pass.model name=ngramh3b23qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 23 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.015687 ngramh3b25qhap20.model name=ngramh3b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -c -q ha --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015761 ngramh3b18qhap1.1.model name=ngramh3b18qhap1.1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015761 ngramh3b18qhap1.model name=ngramh3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015946 b22passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 22 -f ~/b22passes1_robot1.model --compressed
0.015950 ngramh3b18qha.model name=ngramh3b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.016132 ngramh2s1b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.016222 ngramh3qhab18passes1_robot1.model vw --ngram h3 -q ah -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qhab18passes1_robot1.model -c
0.016410 ngramh3b120qhap1.model name=ngramh3b120qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.016819 ngramh3b14qha.model name=ngramh3b14qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 14 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.017327 ngramh3b16qhap1.model name=ngramh3b16qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 16 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.017341 ngramh5b18qha.model name=ngramh5b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.017590 ngramh5b20qha.model name=ngramh5b20qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 20 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.017837 twostep25redo.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --cache_file ~/NA12878_V2.5_Robot_1.1.hhga.gz.train.cache -b 25 --ngram h3 -f ~/twostep25redo.model
0.018426 b25passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 25 -f ~/b25passes1_robot1.model --compressed
0.018602 twostep25.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -f ~/twostep25.model
0.019102 ngramh3b25p1logistic.model name=ngramh3b25p1logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 25 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.019314 ngramh3b15qhap1.model name=ngramh3b15qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.019830 ngramh3b18p1qhalogisticnolink.model name=ngramh3b18p1qhalogisticnolink.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.020048 ngramh3b25qha1pass.model name=ngramh3b25qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 25 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.020212 b25qhap20.model name=b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.020336 ngramh3b25l0.05pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -l 0.05 -f ~/ngramh3b25l0.05pass20.model 2>&1 | tee ~/ngramh3b25l0.05pass20.model.log
0.020813 ngramh3s1b25pass20.model name=ngramh3s1b25pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.021296 ngramh3b25qhap2randstart.model name=ngramh3b25qhap2randstart.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 2 --cache_file $name.cache --binary -b 25 -q ha --ngram h3 --random_weights 0 --loss_function hinge -f ~/$name 2>&1 | tee ~/$name.log
0.021296 q_ha__passes_2__b_25_randstart.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 25 --ngram h3 -f ~/q_ha__passes_2__b_25_randstart.model --random_weights 0 --loss_function hinge
0.023254 ngramh3a3b28q.model name=ngramh3a3b28q.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.023254 ngramh3b28pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 -f ~/ngramh3b28pass20.model 2>&1 | tee ~/ngramh3b28pass20.model.log
0.024283 ngramh3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3a3b18passes1_robot1.model -c
0.025396 ngramh3a3b14interacthass.model name=ngramh3a3b14interacthass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions has -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.026408 ngramh3s1b28pass20.model name=ngramh3s1b28pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.027062 ngramh5b28pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h5 -f ~/ngramh5b28pass20.model 2>&1 | tee ~/ngramh5b28pass20.model.log
0.030736 ngramh3a3b16qhainteracths.model name=ngramh3a3b16qhainteracths.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions hs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.093522 ngramh3b25qhaqas.model name=ngramh3b25qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 25 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.186874 ngramh3a3b14interactass.model name=ngramh3a3b14interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngrama3b18qhap2.model name=ngrama3b18qhap2.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngramh3b18lrqdrophap1.model name=ngramh3b18lrqdrophap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngramh3b18qhap1stagepolye.model name=ngramh3b18qhap1stagepolye.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngramh3b24p20qhann8inpassstagepoly.model name=ngramh3b24p20qhann8inpassstagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 24 --ngram h3 --nn 8 --inpass -c --compressed --passes 20 -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngramh3b24p20qhann8.model : 1460451078:0;name=ngramh3b24p20qhann8.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -b 24 --ngram h3 -q ha --nn 8 -c --passes 20 --cache_file ~/$name.cache -f ~/$name 2>&1 | tee ~/$name.log
0.260958 ngramsh3a3b20p1stagepoly.model name=ngramsh3a3b20p1stagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 20 --ngrams h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.287441 ngramh3a3b18interactass.model name=ngramh3a3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.287441 ngramh3b18interactass.model name=ngramh3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.290500 ngrams2h3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 --skips 2 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams2h3a3b18passes1_robot1.model -c --random_weights 1
0.290587 ngrams1h3a3b18passes1_robot1.model vw --ngram h3 --ngram a3 --skips 1 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams1h3a3b18passes1_robot1.model -c --random_weights 1
0.304333 ngramh3a3b16interactass.model name=ngramh3a3b16interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.304333 ngramh3a3b16qhaqas.model name=ngramh3a3b16qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.328933 ngramh3a3b20qhaqas.model name=ngramh3a3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.328933 ngramh3b20qhaqas.model name=ngramh3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.739042 b25qhap20logistic.model name=b25qhap20logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.739042 ngramh3b18p1qhalogistic.model name=ngramh3b18p1qhalogistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
The main blocking point right now is not knowing how close our current robot1-3 traning data is to the FDA challenge test set. Being able to produce a hhga file or the FDA test challenge is crucial at this point as it guides which direction to push the models in. The quadratic interactions are crcial for inter robot precision but make no difference (and cost us hugely in training time) for the xprize data, so getting a sense of their importance on the challenge is next.
Does your truth set include allele frequency found by orthogonal validation? It would be interesting to see which models perform best at estimating true allele frequencies, especially in regions that are not diploid or with many repeats.
@ivazquez no clue, maybe @ekg knows?
We are not estimating allele frequencies. We estimating genotype/site/allele truthiness. But we should be going forward. That should be pretty easy for vw because a linear model will usually suffice. We have the data in the 1000G but haven't dug into it.
On Wed, Apr 20, 2016, 01:50 Nicolás Della Penna notifications@github.com wrote:
@ivazquez https://github.com/ivazquez no clue, maybe @ekg https://github.com/ekg knows?
— You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub https://github.com/ekg/hhga/issues/13#issuecomment-212175422
We found a problem with the previous examples. I was failing to swap out the alignments for each sample, which resulted in overfitting.
These are the first results after having cleaned up the issue. We're now training only using alignments made using bwa and a simple duplicate marking post-processing step on precisionFDA.
We train on Garvan robots 1, 3, and 4. (2 is huge, and still in the alignment process after 48 hours on 16 cores.) Then we test on Garvan vial 1. These are the results:
0.006133 ziik_b25ngram7.model. name=ziik_b25ngram7.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 7 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006319 ziik_b24ngram5.model. name=ziik_b24ngram5.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 5 -f ~/$name.model --save_resume --cache_file $name.cache -b 24 2>&1 | tee ~/$name.log
0.006345 ziik_b25ngram5.model. name=ziik_b25ngram5.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 5 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006482 ziik_b18ngram3.model. name=ziik_b18ngram3.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 3 -f ~/$name.model --save_resume --cache_file $name.cache -b 18 2>&1 | tee ~/$name.log
0.006544 ziik_b25ngram3.model. name=ziik_b25ngram3.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 3 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006992 ziik_b24ngram3h.model. name=ziik_b24ngram3h.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram h3 -f ~/$name.model --save_resume --cache_file $name.cache -b 24 2>&1 | tee ~/$name.log
0.007548 ziikziik_b25.model. name=ziikziik_b25.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 -b 25 -f ~/$name.model --save_resume --cache_file $name.cache 2>&1 | tee ~/$name.log
0.007597 ziikziik_default.model. name=ziikziik_default.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 -f ~/$name.model --save_resume --cache_file $name.cache 2>&1 | tee ~/$name.log
0.008104 basic. time zcat robot3.hhga.gz robot4.hhga.gz robot1.hhga.gz | sort --random-sort --buffer-size 140G --parallel 36 | pigz > Robots_134.shuf.hhga.gz ; vw -c -d Robots_134.shuf.hhga.gz --power_t 1 -q ha --passes 20 --save_per_pass -f basic.model
0.008459 simplest.shuf34. time vw -d Robots_34.shuf.hhga.gz -c --binary -f simplest.shuf34.softonly.model --keep s
0.009375 ziik_b18ngram3hqha.model. name=ziik_b18ngram3hqha.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram h3 -q ha -f ~/$name.model --save_resume --cache_file $name.cache -b 18 2>&1 | tee ~/$name.log
0.009826 simplest.shuf34.softonly. time vw -d Robots_34.shuf.hhga.gz -c --binary -f simplest.shuf34.softonly.model --keep s
0.016678 softcore. name=softcore_logit; time vw -c -d Robots_134.shuf.hhga.gz --binary --passes 20 --loss_function logistic --link logistic -f ~/$name.model --save_resume -q ss --keep s --save_per_pass --power_t 1 | tee ~/$name.log
0.807193 Jah. name=Jah ; time vw --binary --passes 20 -q ha -q ss -f $name.model --save_resume 2>&1 -d Robots_34.shuf.hhga.gz -c -b 25 --power_t 1 --keep s -l 0.05 --boosting 10 --loss_function logistic | tee ~/$name.log
I'll update when the models with quadratic features between the haplotypes and allelels are up. They don't appear to outperform the ngram models here.
A log of things we try.
Robot_1 vs Robot_2 comparisons using different models.
Trying: