GeneAssembly / biosal

biosal is a distributed BIOlogical Sequence Actor Library. THIS IS A MIRROR.
BSD 2-Clause "Simplified" License
6 stars 1 forks source link

fix race condition in the coverage distribution actor #481

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

Actual:

  36:04 mpiexec -n 4 applications/argonnite -k 43 -threads-per-node 7 /home/boisvert/dropbox/GPIC.1424-1.1371.fastq
  36:04 applications/argonnite -k 43 -threads-per-node 7 /home/boisvert/dropbox/GPIC.1424-1.1371.fastq
  36:04 applications/argonnite -k 43 -threads-per-node 7 /home/boisvert/dropbox/GPIC.1424-1.1371.fastq
  36:04 applications/argonnite -k 43 -threads-per-node 7 /home/boisvert/dropbox/GPIC.1424-1.1371.fastq
  36:04 applications/argonnite -k 43 -threads-per-node 7 /home/boisvert/dropbox/GPIC.1424-1.1371.fastq

Expected:

It should complete in under 30 minutes.

Log:

/home/boisvert/storage/automated-tests/2014-07-30-19:11:43/biosal/log-12222

sebhtml commented 10 years ago

kmer stores are connected to the coverage distribution:

[boisvert@bigmem biosal]$ grep "will use" log-12222 kmer store 534329432 will use coverage distribution 1942418098 kmer store 501916780 will use coverage distribution 1942418098 kmer store 593359960 will use coverage distribution 1942418098 kmer store 1780428796 will use coverage distribution 1942418098 kmer store 302107128 will use coverage distribution 1942418098 kmer store 1696391324 will use coverage distribution 1942418098 kmer store 1542868725 will use coverage distribution 1942418098 kmer store 467589773 will use coverage distribution 1942418098 kmer store 1917031689 will use coverage distribution 1942418098 kmer store 1161034269 will use coverage distribution 1942418098 kmer store 1760517185 will use coverage distribution 1942418098 kmer store 1390202837 will use coverage distribution 1942418098 kmer store 754858702 will use coverage distribution 1942418098 kmer store 1743786890 will use coverage distribution 1942418098 kmer store 1956797490 will use coverage distribution 1942418098 kmer store 1921721994 will use coverage distribution 1942418098 kmer store 353490750 will use coverage distribution 1942418098 kmer store 468629758 will use coverage distribution 1942418098 kmer store 1201150063 will use coverage distribution 1942418098 kmer store 805785867 will use coverage distribution 1942418098 kmer store 1340050547 will use coverage distribution 1942418098 kmer store 1436954459 will use coverage distribution 1942418098 kmer store 992019775 will use coverage distribution 1942418098 kmer store 1708400887 will use coverage distribution 1942418098

[boisvert@bigmem biosal]$ grep "will use" log-12222 | wc -l 24

sebhtml commented 10 years ago

One of the kmer stores does not report its local table:

[boisvert@bigmem biosal]$ grep "local table" log-12222
kmer store 534329432: local table has 187696513 canonical kmers (375393026 kmers) kmer store 1696391324: local table has 187677740 canonical kmers (375355480 kmers) kmer store 302107128: local table has 187706968 canonical kmers (375413936 kmers) kmer store 593359960: local table has 187678748 canonical kmers (375357496 kmers) kmer store 501916780: local table has 187696630 canonical kmers (375393260 kmers) kmer store 1780428796: local table has 187718550 canonical kmers (375437100 kmers) kmer store 1161034269: local table has 187718274 canonical kmers (375436548 kmers) kmer store 1542868725: local table has 187687030 canonical kmers (375374060 kmers) kmer store 1917031689: local table has 187702102 canonical kmers (375404204 kmers) kmer store 467589773: local table has 187676660 canonical kmers (375353320 kmers) kmer store 1390202837: local table has 187708739 canonical kmers (375417478 kmers) kmer store 468629758: local table has 187692696 canonical kmers (375385392 kmers) kmer store 1921721994: local table has 187702617 canonical kmers (375405234 kmers) kmer store 754858702: local table has 187681186 canonical kmers (375362372 kmers) kmer store 1743786890: local table has 187692823 canonical kmers (375385646 kmers) kmer store 1956797490: local table has 187695671 canonical kmers (375391342 kmers) kmer store 1201150063: local table has 187682078 canonical kmers (375364156 kmers) kmer store 992019775: local table has 187671662 canonical kmers (375343324 kmers) kmer store 1436954459: local table has 187711744 canonical kmers (375423488 kmers) kmer store 1708400887: local table has 187702062 canonical kmers (375404124 kmers) kmer store 805785867: local table has 187680502 canonical kmers (375361004 kmers) kmer store 1760517185: local table has 187702624 canonical kmers (375405248 kmers) kmer store 1340050547: local table has 187699750 canonical kmers (375399500 kmers)

[boisvert@bigmem biosal]$ grep "local table" log-12222 |wc -l 23

sebhtml commented 10 years ago

Only 22 stores sent their stuff:

[boisvert@bigmem biosal]$ grep ^SENDING log-12222 SENDING kmer store 1942418098 sends map to 12588, 467589773 bytes / 441 entries SENDING kmer store 1942418098 sends map to 12588, 1542868725 bytes / 428 entries SENDING kmer store 1942418098 sends map to 12588, 1956797490 bytes / 428 entries SENDING kmer store 1942418098 sends map to 12588, 1436954459 bytes / 431 entries SENDING kmer store 1942418098 sends map to 12588, 534329432 bytes / 427 entries SENDING kmer store 1942418098 sends map to 12588, 1743786890 bytes / 446 entries SENDING kmer store 1942418098 sends map to 12588, 1780428796 bytes / 453 entries SENDING kmer store 1942418098 sends map to 12588, 1917031689 bytes / 452 entries SENDING kmer store 1942418098 sends map to 12588, 1696391324 bytes / 433 entries SENDING kmer store 1942418098 sends map to 12588, 1161034269 bytes / 461 entries SENDING kmer store 1942418098 sends map to 12588, 593359960 bytes / 443 entries SENDING kmer store 1942418098 sends map to 12588, 501916780 bytes / 433 entries SENDING kmer store 1942418098 sends map to 12588, 1708400887 bytes / 443 entries SENDING kmer store 1942418098 sends map to 12588, 1390202837 bytes / 444 entries SENDING kmer store 1942418098 sends map to 12588, 1921721994 bytes / 442 entries SENDING kmer store 1942418098 sends map to 12588, 302107128 bytes / 423 entries SENDING kmer store 1942418098 sends map to 12588, 1340050547 bytes / 449 entries SENDING kmer store 1942418098 sends map to 12588, 468629758 bytes / 464 entries SENDING kmer store 1942418098 sends map to 12588, 805785867 bytes / 435 entries SENDING kmer store 1942418098 sends map to 12588, 1201150063 bytes / 418 entries SENDING kmer store 1942418098 sends map to 12588, 754858702 bytes / 427 entries SENDING kmer store 1942418098 sends map to 12588, 992019775 bytes / 412 entries

[boisvert@bigmem biosal]$ grep ^SENDING log-12222 | wc -l 22

sebhtml commented 10 years ago

The distribution actor received the coverage data for only 22 stores:

[boisvert@bigmem biosal]$ grep "receives coverage" log-12222 distribution/1942418098 receives coverage data from producer/467589773, 441 entries / 12588 bytes 1/24 distribution/1942418098 receives coverage data from producer/1542868725, 428 entries / 12588 bytes 2/24 distribution/1942418098 receives coverage data from producer/1956797490, 428 entries / 12588 bytes 3/24 distribution/1942418098 receives coverage data from producer/1436954459, 431 entries / 12588 bytes 4/24 distribution/1942418098 receives coverage data from producer/534329432, 427 entries / 12588 bytes 5/24 distribution/1942418098 receives coverage data from producer/1743786890, 446 entries / 12588 bytes 6/24 distribution/1942418098 receives coverage data from producer/1780428796, 453 entries / 12588 bytes 7/24 distribution/1942418098 receives coverage data from producer/1917031689, 452 entries / 12588 bytes 8/24 distribution/1942418098 receives coverage data from producer/1696391324, 433 entries / 12588 bytes 9/24 distribution/1942418098 receives coverage data from producer/1161034269, 461 entries / 12588 bytes 10/24 distribution/1942418098 receives coverage data from producer/593359960, 443 entries / 12588 bytes 11/24 distribution/1942418098 receives coverage data from producer/501916780, 433 entries / 12588 bytes 12/24 distribution/1942418098 receives coverage data from producer/1708400887, 443 entries / 12588 bytes 13/24 distribution/1942418098 receives coverage data from producer/1390202837, 444 entries / 12588 bytes 14/24 distribution/1942418098 receives coverage data from producer/1921721994, 442 entries / 12588 bytes 15/24 distribution/1942418098 receives coverage data from producer/302107128, 423 entries / 12588 bytes 16/24 distribution/1942418098 receives coverage data from producer/1340050547, 449 entries / 12588 bytes 17/24 distribution/1942418098 receives coverage data from producer/468629758, 464 entries / 12588 bytes 18/24 distribution/1942418098 receives coverage data from producer/805785867, 435 entries / 12588 bytes 19/24 distribution/1942418098 receives coverage data from producer/1201150063, 418 entries / 12588 bytes 20/24 distribution/1942418098 receives coverage data from producer/754858702, 427 entries / 12588 bytes 21/24 distribution/1942418098 receives coverage data from producer/992019775, 412 entries / 12588 bytes 22/24

[boisvert@bigmem biosal]$ grep "receives coverage" log-12222 |wc -l 22

sebhtml commented 10 years ago

stores are ready too:

[boisvert@bigmem biosal]$ grep "ready" log-12222 |grep stores argonnite 880827860: stores are ready, 5255585552/5255585552 kmers

sebhtml commented 10 years ago

OK.

This is caused by a bug in bsal_worker_pool_wake_up_workers

sebhtml commented 10 years ago

@hubot says: implemented in commit https://github.com/sebhtml/biosal/commit/3fa7da36664c869aa6fe451101b2f8a8a6d8e54a by @sebhtml !