ekg / hhga

haplotypes genotypes and alleles example decision synthesizer
MIT License
20 stars 3 forks source link

vw experiments #13

Open ekg opened 8 years ago

ekg commented 8 years ago

A log of things we try.

Robot_1 vs Robot_2 comparisons using different models.

Trying:

time vw -d  ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha --ngram a5 -c -b 26 -f ~/ngram5_robot1.model --compressed
time vw -d  ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha -c -b 26 -f ~/nongram_robot1.model --compressed
nikete commented 8 years ago

So with a fiarly simple quadratic model once we have error rats under 2% on the open dataset with window size of 16, initial ngram experiments failed to learn, but the parameter spaceis reaching limit, so worth reruning those experiments with full data.

freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha -c -b 26 -f ~/nongram_robot1.model --compressed

on testing: freebayes@ip-10-0-0-6:~$ vw -t -d ~/XPrize_Illumina_WG.open_w16.hhga.gz --binary -i ~/nongram_robot1.model

loss of 0.013290

nikete commented 8 years ago

freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha --ngram a5 -c -b 26 -f ~/ngram5_robot1.model --compressed │·

does well,

nikete commented 8 years ago

Once we understood the theory, twopass quadratic generalizes [[edit this was wrong, info was leaking from future]]

freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 2 -c -q ah -b 18 -f ~/twopass_robot1.model --compressed │·

nikete commented 8 years ago

[[edit this was wrong, info was leaking from future]] │· finished run │· number of examples per pass = 3691989 │· passes used = 2 │· weighted example sum = 7383978.000000 │· weighted label sum = 3907430.000000 │· average loss = 0.006821 h │· best constant = 0.529177 │· best constant's loss = 0.719972 │· total feature number = 235890300812 │· │· real 32m14.351s │· user 49m8.946s │· sys 0m18.625s │· freebayes@ip-10-0-0-6:~$ time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model

nikete commented 8 years ago
                                                                                                      │·

finished run │· number of examples per pass = 482196 │· passes used = 1 │· weighted example sum = 482196.000000 │· weighted label sum = -230530.000000 │· average loss = 0.015220 │· best constant = -0.478084 │· best constant's loss = 0.771436 │· total feature number = 14165571258 │· freebayes@ip-10-0-0-6:~$ vw -d ~/XPrize_Illumina_WG.open_w16.hhga.gz --binary -t -i ~/twostep.model

ekg commented 8 years ago
vw -d  ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --ngram h5 --nn 100  -c -b 25 -f ~/ngram5hnn100_robot1.model --compressed                                                                        
vw -d  ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --nn 1000  -c -b 25 -f ~/nn1000_robot1.model --compressed
ekg commented 8 years ago
vw -d  ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
ekg commented 8 years ago
vw -d  ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/b18passes1_robot1.model --compressed
ekg commented 8 years ago

Trained models on full robot1 from garvan.

Tested against the second garvan set (robot2).

0.002364    twostep25.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -f ~/twostep25.model
0.002695    twostep25redo.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --cache_file ~/NA12878_V2.5_Robot_1.1.hhga.gz.train.cache -b 25 --ngram h3 -f ~/twostep25redo.model
0.003785    b25qhap20.model name=b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.004464    ngramh3a3b18interactass.model   name=ngramh3a3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.004464    ngramh3b18interactass.model name=ngramh3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005045    ngramh3b18qha.model name=ngramh3b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005074    ngramh3b18passes1_robot1.model  vw --ngram 3h -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3b18passes1_robot1.model -c
0.005121    twostep.model   time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model
0.005248    ngramh3a3b16interactass.model   name=ngramh3a3b16interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005248    ngramh3a3b16qhaqas.model    name=ngramh3a3b16qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005402    ngramh3a3b18passes1_robot1.model    vw --ngram h3 --ngram a3 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3a3b18passes1_robot1.model -c
0.005629    ngramh3b25qhaqas.model  name=ngramh3b25qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 25 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005818    ngramh3a3b20qhaqas.model    name=ngramh3a3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005818    ngramh3b20qhaqas.model  name=ngramh3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.005840    ngramh3a3b16qhainteracths.model name=ngramh3a3b16qhainteracths.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions hs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.005958    b25passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 25 -f ~/b25passes1_robot1.model --compressed
0.006112    ngramh3a3b14interactass.model   name=ngramh3a3b14interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.006225    ngramh3a3b16qha.model   name=ngramh3a3b16qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.006483    b22passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 22 -f ~/b22passes1_robot1.model --compressed
0.006887    b20passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 20 -f ~/b20passes1_robot1.model --compressed
0.007165    ngramh3a3b16interactahs.model   name=ngramh3a3b16interactahs.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 --interactions ahs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.007219    ngramh3b15qha.model name=ngramh3b15qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.007825    b25qhap2.model  name=b25qhap20model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.008095    b18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.008095    b18qhap1.model  name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.008095    h3b18qhap1.model    name=h3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/$name 2>&1 | tee ~/$name.log
0.008140    ngramh3b25qhap2randstart.model  name=ngramh3b25qhap2randstart.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 2 --cache_file $name.cache --binary -b 25 -q ha --ngram h3 --random_weights 0 --loss_function hinge -f ~/$name 2>&1 | tee ~/$name.log
0.008140    q_ha__passes_2__b_25_randstart.model    time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 25 --ngram h3 -f ~/q_ha__passes_2__b_25_randstart.model --random_weights 0 --loss_function hinge
0.008176    ngramh3a3b28q.model name=ngramh3a3b28q.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.008176    ngramh3b28pass20.model  time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 -f ~/ngramh3b28pass20.model 2>&1 | tee ~/ngramh3b28pass20.model.log
0.008330    ngramh3b25qha1pass.model    name=ngramh3b25qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 25 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.008561    ngramh3b14qha.model name=ngramh3b14qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 14 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.008637    ngrama3b18qhap1.model   name=ngrama3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.008816    ngramh3s1b28pass20.model    name=ngramh3s1b28pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.009089    ngramh3b25l0.05pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -l 0.05 -f ~/ngramh3b25l0.05pass20.model 2>&1 | tee ~/ngramh3b25l0.05pass20.model.log
0.009274    ngramh2b18passes1_robot1.model  vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
0.009459    ngramh5b28pass20.model  time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h5 -f ~/ngramh5b28pass20.model 2>&1 | tee ~/ngramh5b28pass20.model.log
0.009462    ngramh3s1b25pass20.model    name=ngramh3s1b25pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.009608    ngramh3b120qhap1.model  name=ngramh3b120qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.009760    ngramh5b20qha.model name=ngramh5b20qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 20 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.009771    ngramh3qhab18passes1_robot1.model   vw --ngram h3 -q ah -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qhab18passes1_robot1.model -c
0.009802    ngramh3b25qhap20.model  name=ngramh3b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -c -q ha --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010099    ngramh2s1b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.010178    ngramh3b25p1logistic.model  name=ngramh3b25p1logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 25 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.010240    ngramh3b18qhap1.1.model name=ngramh3b18qhap1.1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010240    ngramh3b18qhap1.model   name=ngramh3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.010240    ngramh3qb18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.010509    ngramh5b18qha.model name=ngramh5b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.010961    ngramh3b16qhap1.model   name=ngramh3b16qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 16 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.011531    ngramh3b15qhap1.model   name=ngramh3b15qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.012500    ngramh3b18p1qhalogisticnolink.model name=ngramh3b18p1qhalogisticnolink.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.013266    ngramh3a3b14interacthass.model  name=ngramh3a3b14interacthass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions has -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.054843    ngrams2h3a3b18passes1_robot1.model  vw --ngram h3 --ngram a3 --skips 2 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams2h3a3b18passes1_robot1.model -c --random_weights 1
0.058047    ngrams1h3a3b18passes1_robot1.model  vw --ngram h3 --ngram a3 --skips 1 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams1h3a3b18passes1_robot1.model -c --random_weights 1
0.255162    b25qhap20logistic.model name=b25qhap20logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.255162    ngramh3b18p1qhalogistic.model   name=ngramh3b18p1qhalogistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.744838    ngrama3b18qhap2.model   name=ngrama3b18qhap2.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838    ngramh3b18lrqdrophap1.model name=ngramh3b18lrqdrophap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838    ngramh3b18qhap1stagepolye.model name=ngramh3b18qhap1stagepolye.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.744838    ngramh3b24p20qhann8inpassstagepoly.model    name=ngramh3b24p20qhann8inpassstagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 24 --ngram h3 --nn 8 --inpass -c --compressed --passes 20 -f ~/$name 2>&1 | tee ~/$name.log
0.744838    ngramh3b24p20qhann8.model   : 1460451078:0;name=ngramh3b24p20qhann8.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -b 24 --ngram h3 -q ha --nn 8 -c --passes 20 --cache_file ~/$name.cache -f ~/$name 2>&1 | tee ~/$name.log

And for xprize

0.014448    ngramh3b18passes1_robot1.model  vw --ngram 3h -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3b18passes1_robot1.model -c
0.014575    b25qhap2.model  name=b25qhap20model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.014712    b18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.014712    b18qhap1.model  name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.014712    h3b18qhap1.model    name=h3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 -f ~/$name 2>&1 | tee ~/$name.log
0.014731    b20passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 20 -f ~/b20passes1_robot1.model --compressed
0.014932    ngrama3b18qhap1.model   name=ngrama3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.015077    ngramh3b15qha.model name=ngramh3b15qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015220    twostep.model   time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 20 -f ~/twostep.model
0.015533    ngramh2b18passes1_robot1.model  vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h2 -c -b 18 -f ~/ngramh2b18passes1_robot1.model --compressed
0.015541    ngramh3qb18passes1_robot1.model vw --ngram 3h -q ha -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qb18passes1_robot1.model -c
0.015554    ngramh3a3b16qha.model   name=ngramh3a3b16qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.015639    ngramh3b23qha1pass.model    name=ngramh3b23qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 23 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.015687    ngramh3b25qhap20.model  name=ngramh3b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -c -q ha --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015761    ngramh3b18qhap1.1.model name=ngramh3b18qhap1.1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015761    ngramh3b18qhap1.model   name=ngramh3b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.015946    b22passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 22 -f ~/b22passes1_robot1.model --compressed
0.015950    ngramh3b18qha.model name=ngramh3b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.016132    ngramh2s1b18qhap1.model name=ngramh2s1b18qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 18 --ngram h2 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.016222    ngramh3qhab18passes1_robot1.model   vw --ngram h3 -q ah -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3qhab18passes1_robot1.model -c
0.016410    ngramh3b120qhap1.model  name=ngramh3b120qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.016819    ngramh3b14qha.model name=ngramh3b14qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 14 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.017327    ngramh3b16qhap1.model   name=ngramh3b16qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 16 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.017341    ngramh5b18qha.model name=ngramh5b18qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 18 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.017590    ngramh5b20qha.model name=ngramh5b20qha.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 20 --ngram h5 -f ~/$name 2>&1 | tee ~/$name.log
0.017837    twostep25redo.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --cache_file ~/NA12878_V2.5_Robot_1.1.hhga.gz.train.cache -b 25 --ngram h3 -f ~/twostep25redo.model
0.018426    b25passes1_robot1.model vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 25 -f ~/b25passes1_robot1.model --compressed
0.018602    twostep25.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -f ~/twostep25.model
0.019102    ngramh3b25p1logistic.model  name=ngramh3b25p1logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -b 25 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.019314    ngramh3b15qhap1.model   name=ngramh3b15qhap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha -c -b 15 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.019830    ngramh3b18p1qhalogisticnolink.model name=ngramh3b18p1qhalogisticnolink.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic -f ~/$name 2>&1 | tee ~/$name.log
0.020048    ngramh3b25qha1pass.model    name=ngramh3b25qha1pass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 1 -q ha --ngram h3 -c -b 25 --compressed -f ~/$name 2>&1 | tee ~/$name.log
0.020212    b25qhap20.model name=b25qhap20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha -f ~/$name 2>&1 | tee ~/$name.log
0.020336    ngramh3b25l0.05pass20.model time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 -l 0.05 -f ~/ngramh3b25l0.05pass20.model 2>&1 | tee ~/ngramh3b25l0.05pass20.model.log
0.020813    ngramh3s1b25pass20.model    name=ngramh3s1b25pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 25 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.021296    ngramh3b25qhap2randstart.model  name=ngramh3b25qhap2randstart.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 2 --cache_file $name.cache --binary -b 25 -q ha --ngram h3 --random_weights 0 --loss_function hinge -f ~/$name 2>&1 | tee ~/$name.log
0.021296    q_ha__passes_2__b_25_randstart.model    time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -c -b 25 --ngram h3 -f ~/q_ha__passes_2__b_25_randstart.model --random_weights 0 --loss_function hinge
0.023254    ngramh3a3b28q.model name=ngramh3a3b28q.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.023254    ngramh3b28pass20.model  time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 -f ~/ngramh3b28pass20.model 2>&1 | tee ~/ngramh3b28pass20.model.log
0.024283    ngramh3a3b18passes1_robot1.model    vw --ngram h3 --ngram a3 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngramh3a3b18passes1_robot1.model -c
0.025396    ngramh3a3b14interacthass.model  name=ngramh3a3b14interacthass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions has -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.026408    ngramh3s1b28pass20.model    name=ngramh3s1b28pass20.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h3 --skips 1 -f ~/$name 2>&1 | tee ~/$name.log
0.027062    ngramh5b28pass20.model  time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -c -b 28 --ngram h5 -f ~/ngramh5b28pass20.model 2>&1 | tee ~/ngramh5b28pass20.model.log
0.030736    ngramh3a3b16qhainteracths.model name=ngramh3a3b16qhainteracths.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions hs -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.093522    ngramh3b25qhaqas.model  name=ngramh3b25qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 25 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.186874    ngramh3a3b14interactass.model   name=ngramh3a3b14interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 14 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngrama3b18qhap2.model   name=ngrama3b18qhap2.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 2 -q ha -b 18 --ngram a3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngramh3b18lrqdrophap1.model name=ngramh3b18lrqdrophap1.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngramh3b18qhap1stagepolye.model name=ngramh3b18qhap1stagepolye.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --lrqdropout -q ha -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngramh3b24p20qhann8inpassstagepoly.model    name=ngramh3b24p20qhann8inpassstagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 24 --ngram h3 --nn 8 --inpass -c --compressed --passes 20 -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngramh3b24p20qhann8.model   : 1460451078:0;name=ngramh3b24p20qhann8.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -b 24 --ngram h3 -q ha --nn 8 -c --passes 20 --cache_file ~/$name.cache -f ~/$name 2>&1 | tee ~/$name.log
0.260958    ngramsh3a3b20p1stagepoly.model  name=ngramsh3a3b20p1stagepoly.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --stage_poly -b 20 --ngrams h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.287441    ngramh3a3b18interactass.model   name=ngramh3a3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.287441    ngramh3b18interactass.model name=ngramh3b18interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 18 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.290500    ngrams2h3a3b18passes1_robot1.model  vw --ngram h3 --ngram a3 --skips 2 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams2h3a3b18passes1_robot1.model -c --random_weights 1
0.290587    ngrams1h3a3b18passes1_robot1.model  vw --ngram h3 --ngram a3 --skips 1 -d NA12878_V2.5_Robot_1.1.hhga.gz --binary -f ngrams1h3a3b18passes1_robot1.model -c --random_weights 1
0.304333    ngramh3a3b16interactass.model   name=ngramh3a3b16interactass.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha --interactions as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.304333    ngramh3a3b16qhaqas.model    name=ngramh3a3b16qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 16 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.328933    ngramh3a3b20qhaqas.model    name=ngramh3a3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3a3 -f ~/$name 2>&1 | tee ~/$name.log
0.328933    ngramh3b20qhaqas.model  name=ngramh3b20qhaqas.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary --passes 20 -q ha -q as -c -b 20 --ngram h3 -f ~/$name 2>&1 | tee ~/$name.log
0.739042    b25qhap20logistic.model name=b25qhap20logistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --passes 20 --cache_file $name.cache --binary -b 25 -q ha --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
0.739042    ngramh3b18p1qhalogistic.model   name=ngramh3b18p1qhalogistic.model ; time vw -d ~/NA12878_V2.5_Robot_1.1.hhga.gz --binary -q ha -b 18 --ngram h3 --loss_function logistic --link logistic -f ~/$name 2>&1 | tee ~/$name.log
nikete commented 8 years ago

The main blocking point right now is not knowing how close our current robot1-3 traning data is to the FDA challenge test set. Being able to produce a hhga file or the FDA test challenge is crucial at this point as it guides which direction to push the models in. The quadratic interactions are crcial for inter robot precision but make no difference (and cost us hugely in training time) for the xprize data, so getting a sense of their importance on the challenge is next.

ivazquez commented 8 years ago

Does your truth set include allele frequency found by orthogonal validation? It would be interesting to see which models perform best at estimating true allele frequencies, especially in regions that are not diploid or with many repeats.

nikete commented 8 years ago

@ivazquez no clue, maybe @ekg knows?

ekg commented 8 years ago

We are not estimating allele frequencies. We estimating genotype/site/allele truthiness. But we should be going forward. That should be pretty easy for vw because a linear model will usually suffice. We have the data in the 1000G but haven't dug into it.

On Wed, Apr 20, 2016, 01:50 Nicolás Della Penna notifications@github.com wrote:

@ivazquez https://github.com/ivazquez no clue, maybe @ekg https://github.com/ekg knows?

— You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub https://github.com/ekg/hhga/issues/13#issuecomment-212175422

ekg commented 8 years ago

We found a problem with the previous examples. I was failing to swap out the alignments for each sample, which resulted in overfitting.

These are the first results after having cleaned up the issue. We're now training only using alignments made using bwa and a simple duplicate marking post-processing step on precisionFDA.

We train on Garvan robots 1, 3, and 4. (2 is huge, and still in the alignment process after 48 hours on 16 cores.) Then we test on Garvan vial 1. These are the results:

0.006133    ziik_b25ngram7.model.   name=ziik_b25ngram7.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 7 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006319    ziik_b24ngram5.model.   name=ziik_b24ngram5.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 5 -f ~/$name.model --save_resume --cache_file $name.cache -b 24 2>&1 | tee ~/$name.log
0.006345    ziik_b25ngram5.model.   name=ziik_b25ngram5.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 5 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006482    ziik_b18ngram3.model.   name=ziik_b18ngram3.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 3 -f ~/$name.model --save_resume --cache_file $name.cache -b 18 2>&1 | tee ~/$name.log
0.006544    ziik_b25ngram3.model.   name=ziik_b25ngram3.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram 3 -f ~/$name.model --save_resume --cache_file $name.cache -b 25 2>&1 | tee ~/$name.log
0.006992    ziik_b24ngram3h.model.  name=ziik_b24ngram3h.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram h3 -f ~/$name.model --save_resume --cache_file $name.cache -b 24 2>&1 | tee ~/$name.log
0.007548    ziikziik_b25.model. name=ziikziik_b25.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 -b 25 -f ~/$name.model --save_resume --cache_file $name.cache 2>&1 | tee ~/$name.log
0.007597    ziikziik_default.model. name=ziikziik_default.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 -f ~/$name.model --save_resume --cache_file $name.cache 2>&1 | tee ~/$name.log
0.008104    basic.  time zcat robot3.hhga.gz robot4.hhga.gz robot1.hhga.gz | sort --random-sort --buffer-size 140G --parallel 36 | pigz > Robots_134.shuf.hhga.gz ; vw -c -d Robots_134.shuf.hhga.gz --power_t 1 -q ha --passes 20 --save_per_pass -f basic.model
0.008459    simplest.shuf34.    time vw -d Robots_34.shuf.hhga.gz -c --binary -f simplest.shuf34.softonly.model --keep s
0.009375    ziik_b18ngram3hqha.model.   name=ziik_b18ngram3hqha.model; time vw -d Robots_134.shuf.hhga.gz --binary --passes 20 --ngram h3 -q ha -f ~/$name.model --save_resume --cache_file $name.cache -b 18 2>&1 | tee ~/$name.log
0.009826    simplest.shuf34.softonly.   time vw -d Robots_34.shuf.hhga.gz -c --binary -f simplest.shuf34.softonly.model --keep s
0.016678    softcore.   name=softcore_logit; time vw -c -d Robots_134.shuf.hhga.gz --binary --passes 20 --loss_function logistic --link logistic -f ~/$name.model --save_resume -q ss --keep s --save_per_pass --power_t 1 | tee ~/$name.log
0.807193    Jah.    name=Jah ; time vw --binary --passes 20 -q ha -q ss -f $name.model --save_resume 2>&1 -d Robots_34.shuf.hhga.gz -c -b 25 --power_t 1 --keep s -l 0.05 --boosting 10 --loss_function logistic | tee ~/$name.log

I'll update when the models with quadratic features between the haplotypes and allelels are up. They don't appear to outperform the ngram models here.