GeneAssembly / biosal

biosal is a distributed BIOlogical Sequence Actor Library. THIS IS A MIRROR.
BSD 2-Clause "Simplified" License
6 stars 1 forks source link

run argonnite on Iowa Continuous Corn on Mira #428

Closed sebhtml closed 10 years ago

sebhtml commented 10 years ago

/gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests

biosal-Iowa-23 is queued (300250)

sebhtml commented 10 years ago

ran out of memory, reduce the amount of concurrent active messages.

AUTO-SCALING kernel 228292686 receives auto-scale message (BSAL_ACTOR_DO_AUTO_SCALING) via actor 228292686 kernel 364932174 is online !!! DEBUG Error bsal_memory_allocate returned (nil), 132691252 bytes bsal_tracer_print_stack_backtrace Stack backtrace has 11 frames

0 [0x102562c]

1 [0x1024dd8]

2 [0x1025e10]

3 [0x1025bfc]

4 [0x1022620]

5 [0x100fe84]

6 [0x1012438]

7 [0x10135b8]

8 [0x1013324]

9 [0x145c8b0]

10 [0x1546dec]

DEBUG Error bsal_memory_allocate returned (nil), 132691252 bytes bsal_tracer_print_stack_backtrace

sebhtml commented 10 years ago

dependency: #426

sebhtml commented 10 years ago

[boisvert@miralac1 biosal-tests]$ ./biosal-Iowa-24.sh \ Project 'compbio'; job rerouted to queue 'prod-short' 300743

sebhtml commented 10 years ago

auto-scaling is still enabled...

[boisvert@miralac1 biosal-tests]$ grep AUTO biosal-Iowa-24.output|grep enables | grep node|wc -l 1885

[boisvert@miralac1 biosal-tests]$ grep 111740947 biosal-Iowa-24.output | head kernel 111740947 is online !!! kernel 111740947 processed 25998 entries (1 blocks) so far AUTO-SCALING kernel 111740947 enables auto-scaling (BSAL_ACTOR_ENABLE_AUTO_SCALING) via actor 77834752 AUTO-SCALING node/19 enables auto-scaling for actor 111740947 (BSAL_ACTOR_ENABLE_AUTO_SCALING)

sebhtml commented 10 years ago

nevermind, the limit is 0

sebhtml commented 10 years ago

the timer code in thorium is broken for blue gene

sebhtml commented 10 years ago

log file:

biosal-Iowa-24.output

sebhtml commented 10 years ago

[boisvert@miralac1 biosal-tests]$ addr2line -e argonnite < biosal-Iowa-24.output.stack /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/tracer.c:36 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory.c:97 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory_block.c:30 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory_pool.c:167 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory_pool.c:80 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/genomics/kernels/aggregator.c:172 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/dispatcher.c:75 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/actor.c:1243 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/actor.c:1827 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/worker.c:1148 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/worker.c:246 /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/glibc-2.12.2/nptl/pthread_create.c:322 :0 ??:0

sebhtml commented 10 years ago

DEBUG Error bsal_memory_allocate returned (nil), 8388608 bytes

sebhtml commented 10 years ago

[boisvert@miralac1 biosal-tests]$ grep "kmer store" biosal-Iowa-24.output |grep coverage|wc -l 15872 [boisvert@miralac1 biosal-tests]$ echo $((512 * 31)) 15872

sebhtml commented 10 years ago

memory usage is not uniform...

MEMORY 1730 s node/52 4353716224 bytes MEMORY 1730 s node/18 13866401792 bytes MEMORY 1730 s node/318 5360336896 bytes MEMORY 1730 s node/123 4924129280 bytes MEMORY 1730 s node/8 14084567040 bytes MEMORY 1730 s node/114 6165651456 bytes MEMORY 1730 s node/14 14067728384 bytes MEMORY 1730 s node/4 14025768960 bytes
sebhtml commented 10 years ago

completion is not uniform:

[boisvert@miralac1 biosal-tests]$ grep left biosal-Iowa-24.output | tail -n 15872|awk '{print $6}'|sort | uniq -c 474 (0.63) 42 (0.64) 1183 (0.81) 853 (0.82)

sebhtml commented 10 years ago

[boisvert@miralac1 biosal-tests]$ ./biosal-Iowa-25.sh \ Project 'compbio'; job rerouted to queue 'prod-short' 301012

sebhtml commented 10 years ago

.

sebhtml commented 10 years ago

.

sebhtml commented 10 years ago

.

sebhtml commented 10 years ago

.

sebhtml commented 10 years ago

Almost there:

sequence store 372103113 has 30234/286720 (0.11) entries left to produce sequence store 1192878683 has 32168/290816 (0.11) entries left to produce sequence store 1377052464 has 32168/290816 (0.11) entries left to produce sequence store 1875153060 has 11900/290816 (0.04) entries left to produce sequence store 1235229640 has 30234/286720 (0.11) entries left to produce

DEBUG Error bsal_memory_allocate returned (nil), 32104524 bytes

[boisvert@miralac1 biosal-tests]$ addr2line -e argonnite < biosal-Iowa-25.stack /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/tracer.c:40 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory.c:97 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory_pool.c:121 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/core/system/memory_pool.c:80 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/genomics/kernels/dna_kmer_counter_kernel.c:281 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/actor.c:899 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/actor.c:1829 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/worker.c:1148 /gpfs/mira-fs1/projects/CompBIO/Projects/biosal-tests/biosal/engine/thorium/worker.c:246 /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/glibc-2.12.2/nptl/pthread_create.c:322 :0 ??:0

sebhtml commented 10 years ago

[boisvert@miralac1 biosal-tests]$ grep MEMORY biosal-Iowa-25.output|tail -n 512|awk '{print $6}'|sort -r -n|head 16441671680 9093050368 9090310144 9083207680 9080508416 9079181312 9073364992 9045626880 9034493952 9030164480

sebhtml commented 10 years ago

somehow, node 260 had a problem (?):

[boisvert@miralac1 biosal-tests]$ grep "node/260" biosal-Iowa-25.output|grep MEMORY|tail

MEMORY 1425 s node/260 11240734720 bytes MEMORY 1430 s node/260 11911823360 bytes MEMORY 1435 s node/260 12440305664 bytes MEMORY 1440 s node/260 13027897344 bytes MEMORY 1445 s node/260 13539213312 bytes MEMORY 1450 s node/260 14218690560 bytes MEMORY 1455 s node/260 14693949440 bytes MEMORY 1460 s node/260 15264374784 bytes MEMORY 1465 s node/260 16220676096 bytes MEMORY 1470 s node/260 16441671680 bytes
sebhtml commented 10 years ago

loads:

LOAD EPOCH 1425 s node/68 14.81/15 (0.99) 0.98 1.00 0.99 0.99 0.99 0.98 0.95 1.00 0.99 0.98 0.99 0.99 0.98 0.99 0.98 LOAD EPOCH 1425 s node/306 14.85/15 (0.99) 0.99 0.99 0.99 1.00 0.99 0.97 0.99 0.98 0.99 0.99 0.99 0.99 1.00 0.99 0.99 LOAD EPOCH 1425 s node/169 14.85/15 (0.99) 0.99 0.99 0.98 0.99 0.98 0.99 0.99 0.99 1.00 0.98 0.99 0.98 1.00 0.99 0.99 LOAD EPOCH 1425 s node/132 14.89/15 (0.99) 0.99 0.99 0.99 0.99 1.00 0.99 0.98 1.00 0.99 0.99 1.00 0.99 1.00 0.98 0.99 LOAD EPOCH 1425 s node/260 12.99/15 (0.87) 0.98 0.48 0.47 0.99 0.96 0.95 1.00 0.76 1.00 0.93 0.56 0.94 0.98 0.98 1.00 LOAD EPOCH 1425 s node/367 14.61/15 (0.97) 1.00 0.98 1.00 0.98 0.99 0.99 0.99 0.98 0.98 0.99 0.99 1.00 0.77 0.99 0.96 LOAD EPOCH 1425 s node/488 14.87/15 (0.99) 0.99 1.00 0.99 1.00 1.00 0.99 0.98 0.98 0.99 0.99 1.00 0.99 0.98 0.99 0.99 LOAD EPOCH 1425 s node/104 14.78/15 (0.99) 0.99 0.99 0.99 0.93 0.99 0.98 0.98 1.00 0.98 1.00 0.99 0.99 0.99 0.99 0.98 LOAD EPOCH 1425 s node/98 14.87/15 (0.99) 0.99 1.00 1.00 0.98 0.98 0.99 0.99 0.99 0.99 0.98 0.99 0.98 0.99 1.00 0.99 LOAD EPOCH 1425 s node/393 14.83/15 (0.99) 0.99 0.98 0.99 0.99 0.97 1.00 1.00 0.98 0.99 0.98 0.98 0.98 0.99 0.99 1.00
sebhtml commented 10 years ago

260 is receiving too much messages.

probably some sort of strange kmer (NNNNNNNNNNNNN)

Thorium counters:

[boisvert@cetuslac1 biosal-tests]$ grep BSAL_COUNTER_BALANCE_MESSAGES biosal-Iowa-cetus-1.output | grep ^"259 " | tail 259 balance BSAL_COUNTER_BALANCE_MESSAGES 16532 259 balance BSAL_COUNTER_BALANCE_MESSAGES 14020 259 balance BSAL_COUNTER_BALANCE_MESSAGES -541 259 balance BSAL_COUNTER_BALANCE_MESSAGES 6844 259 balance BSAL_COUNTER_BALANCE_MESSAGES 5477 259 balance BSAL_COUNTER_BALANCE_MESSAGES -7096 259 balance BSAL_COUNTER_BALANCE_MESSAGES 9529 259 balance BSAL_COUNTER_BALANCE_MESSAGES 4080 259 balance BSAL_COUNTER_BALANCE_MESSAGES 6408 259 balance BSAL_COUNTER_BALANCE_MESSAGES 6859 [boisvert@cetuslac1 biosal-tests]$ grep BSAL_COUNTER_BALANCE_MESSAGES biosal-Iowa-cetus-1.output | grep ^"261 " | tail 261 balance BSAL_COUNTER_BALANCE_MESSAGES 3349 261 balance BSAL_COUNTER_BALANCE_MESSAGES -2909 261 balance BSAL_COUNTER_BALANCE_MESSAGES 863 261 balance BSAL_COUNTER_BALANCE_MESSAGES -6086 261 balance BSAL_COUNTER_BALANCE_MESSAGES 3030 261 balance BSAL_COUNTER_BALANCE_MESSAGES -1218 261 balance BSAL_COUNTER_BALANCE_MESSAGES -806 261 balance BSAL_COUNTER_BALANCE_MESSAGES -7859 261 balance BSAL_COUNTER_BALANCE_MESSAGES -5600 261 balance BSAL_COUNTER_BALANCE_MESSAGES 1713 [boisvert@cetuslac1 biosal-tests]$ grep BSAL_COUNTER_BALANCE_MESSAGES biosal-Iowa-cetus-1.output | grep ^"260 " | tail 260 balance BSAL_COUNTER_BALANCE_MESSAGES 413592 260 balance BSAL_COUNTER_BALANCE_MESSAGES 432511 260 balance BSAL_COUNTER_BALANCE_MESSAGES 453165 260 balance BSAL_COUNTER_BALANCE_MESSAGES 503060 260 balance BSAL_COUNTER_BALANCE_MESSAGES 516621 260 balance BSAL_COUNTER_BALANCE_MESSAGES 541214 260 balance BSAL_COUNTER_BALANCE_MESSAGES 569961 260 balance BSAL_COUNTER_BALANCE_MESSAGES 614067 260 balance BSAL_COUNTER_BALANCE_MESSAGES 658516 260 balance BSAL_COUNTER_BALANCE_MESSAGES 691073

sebhtml commented 10 years ago

on cetus

[boisvert@cetuslac1 biosal-tests]$ ./biosal-Iowa-cetus-2.sh 301277

sebhtml commented 10 years ago

[boisvert@cetuslac1 biosal-tests]$ sha1sum coverage_distribution.txt-canonical 01a293db48518190038eaddbaed8a47ca0323fc7 coverage_distribution.txt-canonical [boisvert@cetuslac1 biosal-tests]$ cbank list jobs -p CompBIO|grep 301372.cetus

running time:

0:52:40

[boisvert@cetuslac1 biosal-tests]$ cat biosal-Iowa-cetus-4.sh
#!/bin/bash

qsub \
 -A CompBIO \
 -n 512 \
 -t 01:00:00 \
 -O biosal-Iowa-cetus-4 \
 --mode c1 \
 argonnite -print-counters -print-load -print-memory-usage -threads-per-node 16 -k 43 Iowa_Continuous_Corn/*.fastq

[boisvert@cetuslac1 biosal-tests]$ grep TIMER biosal-Iowa-cetus-4.output TIMER 12 minutes, 45.130127 seconds TIMER 8 minutes, 50.828430 seconds TIMER 21 minutes, 35.958618 seconds