GeneAssembly / biosal

biosal is a distributed BIOlogical Sequence Actor Library. THIS IS A MIRROR.
BSD 2-Clause "Simplified" License
6 stars 1 forks source link

fix regression for sha1sum #477

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

expected sha1sum:

run-argonnite-1 sha1sum 71deaa88265222cd8c27c88980eb5c4f29c966af

failed:

FAILED 19m19.148s 78cfd99 restore old value for single-node jobs [boisvert@bigmem biosal]$ sha1sum output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b output/coverage_distribution.txt-canonical

sha1sum is incorrect too...

sebhtml commented 10 years ago

According to http://biosal.s3.amazonaws.com/quality-assurance-department/2014-07-29-03:01:01.log

ef5f127c2dd was in the PASSED state regarding the sha1sum.

sebhtml commented 10 years ago

Log

FAILED 78cfd99 restore old value for single-node jobs 1569799 update README for mirror d15c337 change the mirror branch name FAILED 9900dcc add code to make a worker wait and wake up FAILED a16900b fix test problem <-------------------- PASSED ef5f127 change symbols for timer

sebhtml commented 10 years ago

real 20m13.068s user 563m32.105s sys 1m50.080s [boisvert@bigmem biosal]$ time ./scripts/development/run-argonnite-1-28.sh |tee test-78cfd99/log^C [boisvert@bigmem biosal]$ mv output/ test-78cfd99 [boisvert@bigmem biosal]$ shasum test-78cfd99/output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b test-78cfd99/output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

real 17m58.960s user 501m12.035s sys 1m37.855s [boisvert@bigmem biosal]$ time ./scripts/development/run-argonnite-1-28.sh |tee test-9900dcc/log^C [boisvert@bigmem biosal]$ shasum test-78cfd99/ou^C [boisvert@bigmem biosal]$ mv output/ test-9900dcc/ [boisvert@bigmem biosal]$ sha1sum test-9900dcc/output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b test-9900dcc/output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

real 20m36.273s user 574m30.648s sys 1m38.820s [boisvert@bigmem biosal]$ time ./scripts/development/run-argonnite-1-28.sh |tee test-a16900b/log^C [boisvert@bigmem biosal]$ mv output/ test-a16900b [boisvert@bigmem biosal]$ sha1sum test-a16900b/output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b test-a16900b/output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

-> use memory fence for: actor being born actor dying

sebhtml commented 10 years ago

FAILED:

[boisvert@bigmem biosal]$ sha1sum test-a16900b/output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b test-a16900b/output/coverage_distribution.txt-canonical

[boisvert@bigmem biosal]$ cat test-a16900b/output/coverage_distribution.txt-canonical | awk '{ sum += $1*$2} END {print sum}' 5436812640

[boisvert@bigmem biosal]$ grep ready log-78cfd99.data/log-78cfd99|tail -n1 argonnite 43565530: stores are ready, 5436812640/5436812640 kmers

PASSED:

[boisvert@bigmem biosal]$ pwd /home/boisvert/storage/automated-tests/2014-07-29-03:01:01/biosal [boisvert@bigmem biosal]$ sha1sum output/coverage_distribution.txt-canonical 71deaa88265222cd8c27c88980eb5c4f29c966af output/coverage_distribution.txt-canonical

[boisvert@bigmem biosal]$ cat /home/boisvert/storage/automated-tests/2014-07-29-03:01:01/biosal/output/coverage_distribution.txt-canonical | awk '{ sum += $1*$2} END {print sum}' 5255585552

[boisvert@bigmem biosal]$ grep ready /home/boisvert/storage/automated-tests/2014-07-29-03:01:01/biosal/log|tail -n1 argonnite 622503548: stores are ready, 5255585552/5255585552 kmers

sebhtml commented 10 years ago

good bad

1 4091038438 1 4490244629 399206191 2 282617804 2 263798193 -18819611 3 66854473 3 56890191 -9964282 4 27818331 4 20664051 -7154280 5 14110532 5 9136168 -4974364 6 7977454 6 4489041 -3488413 7 4698874 7 2430377 -2268497 8 2863391 8 1436786 -1426605 9 1807823 9 904020 -903803 10 1196266 10 587303 -608963

sebhtml commented 10 years ago

this is related to the codec it seems.

sebhtml commented 10 years ago

78cfd99 with 4x7:

yes !

real 28m36.355s user 748m59.630s sys 51m22.533s [boisvert@bigmem biosal]$ time ./scripts/development/run-argonnite-1.sh |tee log^C [boisvert@bigmem biosal]$ sha1sum output/coverage_distribution.txt-canonical 71deaa88265222cd8c27c88980eb5c4f29c966af output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

78cfd992 1x28 with 2 bit encoding for transport and storage with 1 node

FAIL

[boisvert@bigmem biosal]$ sha1sum output/coverage_distribution.txt-canonical 4c7fde8f825106688654f0773af733f829f7e00b output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

78cfd99 2x14

TO FILL

sebhtml commented 10 years ago

The number of ready kmers with 1 node is incorrect. see above.

sebhtml commented 10 years ago

Number of sequences in file: actor:1924299240, 5/42 datasets/Iowa_Continuous_Corn/GPIC.1424-1.1371.fastq 90613544

Actual kmer observations - expected kmer observations: irb(main):002:0> 5436812640 - 5255585552 => 181227088

This is two times the number of sequences: irb(main):004:0> 181227088 / 2 => 90613544

So for each sequence, two spurious kmers are generated. The bug seems to be in dna kernel or before that

sebhtml commented 10 years ago

1x28 is not using the same k.

Problem solved...

sebhtml commented 10 years ago

waiting for test result

sebhtml commented 10 years ago

[boisvert@bigmem biosal]$ time ./scripts/development/run-argonnite-1-28.sh | tee log real 19m15.438s user 537m2.255s sys 1m28.958s [boisvert@bigmem biosal]$ sha1sum output/coverage_distribution.txt-canonical
71deaa88265222cd8c27c88980eb5c4f29c966af output/coverage_distribution.txt-canonical

sebhtml commented 10 years ago

@hubot says: implemented in commit https://github.com/sebhtml/biosal/commit/8329532fbb119e8afa9397eea519fd27c7ba60da by @sebhtml !