`hinge draft` (draft_assembly): --mlas, memory use, and system CPU time

spock commented 7 years ago

Following the E.coli demo, up until the hinge draft command I could use multiple .las files (the --mlas option); even for the DASqv, one can still use multiple .las files - there is no need to merge. I have now started hinge draft and found that it does not support the --mlas option; RAM use is now beyond 195 Gbytes, and keeps growing - I'm afraid it will run out of memory on a huge input .las file, even despite the message [2017-07-04 12:42:52.891] [log] [info] Total number of active reads: 11498/701638.

Is there anything that I can do to make it fit into 256..512 GB?

Update 1: maybe I panicked for no reason, it stopped growing at 80% of RAM (195 GBytes), just when I posted...Oh actually I see a persistent 5% sys load on this 20-core machine, while the process is in R state but does not consume any CPU, strange.

spock commented 7 years ago

RAM use seems to have started very slowly decreasing, looks like a false alarm, sorry :)

spock commented 7 years ago

Is draft_assembly using a lot of system calls, e.g. for memory allocation and freeing? In my case it still consumes very little USER CPU time, but keeps consuming SYS CPU time...

Code execution is currently in this block, for about 4 hours already:

    for (int i = 0; i < n_read; i++){
        reads[i]->active = maximal_read[i];
    }

    std::vector<int> range;

    for (int i = 0; i < n_read; i++) {
        if (reads[i]->active) range.push_back(i);
    }

    std::sort(range.begin(), range.end());

    std::vector<LOverlap *> aln;//Vector of pointers to all alignments
    std::vector<LAlignment *> full_aln;//Vector of pointers to all alignments

    if (strlen(name_las) > 0) {
        la.resetAlignment();
        la.getOverlap(aln, range);
        la.resetAlignment();
        la.getAlignment(full_aln, range);
    }

Another question: if this behavior is not normal - then is it safe to kill and start the process again on a bigger machine, without damaging any of the assembly-related data?

fxia22 commented 7 years ago

Are you tracing this program? I mean are you gdb or valgrind it? In this case it will do a ton a system calls. Otherwise USER CPU time will be the majority.

draft does not use multiple las, but it is quite memory efficient.

fxia22 commented 7 years ago

It is safe to kill and restart at any point.

spock commented 7 years ago

@fxia22 , thanks again for a quick reply to all of my questions :)

More information on what I'm observing:

no, I'm not using gdb/valgrind/strace for this run

user CPU does not seem to be the majority in my case:

$ ps -o comm,pid,ppid,user,time,etime,start,pcpu,state --sort=comm aH | grep '^COMMAND\|R$'
COMMAND           PID  PPID USER         TIME     ELAPSED  STARTED %CPU S
draft_assembly  27296 27294 spock 02:15:28    08:32:43 12:42:27 26.4 R

above, the pcpu column (%CPU) is the fraction of time_using_CPU / total_running_time

if I observe top for 10-15 seconds, then I see that %CPU is usually zero, state is R, and TIME+ does not change: 2551 spock 20 0 0.198t 0.195t 1812 R 0.000 39.53 58:16.59 draft_assembly
but then sometimes %CPU increases to 100% for a brief moment, and then it just sits there again :)
there are no I/O waits on the system, and no competition for CPU time (load < num_cores)
I have started the same process on a different machine, and it behaves the same so far
total number of reads, as reported by draft_assembly, is around 700k

One possibly important detail: I have built Hinge on a machine with much newer kernel and libs (Ubuntu xenial), compared to the quite old kernels and libs where I'm running Hinge.

I agree that it is quite memory efficient: on an almost-900GB .las file, peak memory use was 195GB so far.

spock commented 7 years ago

The process is still in that same stage, I guess I need to try compiling it on the target machine to see if that fixes the problem. I'll provide more feedback when I have more information...

COMMAND           PID  PPID USER         TIME     ELAPSED  STARTED %CPU S
draft_assembly  27296 27294 spock 05:12:56    21:42:03 12:42:27 24.0 R

Update 2017-07-10: the process is still running, but after ~24 hours of runtime it did start using more %user time - so I have not recompiled it, just let it run for now (on 2 machines). Will provide an update/feedback when it finishes.

Update 2017-07-11: original process is still running. I started another draft_assembly on a slightly bigger las file (~1.5TB), and after it finished reading all the data - there was a similar pattern of 0% user time and full system time. So I have now recompiled draft_assembly (but none of the dependecies) on the target machine - let's see what happens.

Update 2, 2017-07-11: recompiling on the target machine does not seem to have any noticeable effect.

govinda-kamath commented 7 years ago

Hi @spock,

Did the process terminate? or do we have a bug there?

spock commented 7 years ago

The process is still not finished:

$ ps -o comm,pid,ppid,user,time,etime,start,pcpu,state --sort=comm aH | grep '^COMMAND\|R$'
COMMAND           PID  PPID USER         TIME     ELAPSED  STARTED %CPU S
draft_assembly  27296 27294 spock 5-15:24:12 9-00:53:36   Jul 04 62.4 R

Now it is consuming nearly 100% of user cpu time, and memory use during these 9 days of runtime has decreased from 195GB to 172GB - so I think it might be working just fine, but extremely slow.

The output lines haven't changed, so I believe it is still in the same segment of code as shown above in https://github.com/HingeAssembler/HINGE/issues/121#issuecomment-312905062 :

[2017-07-04 12:42:27.365] [log] [info] # Alignments: 7472368456
[2017-07-04 12:42:27.365] [log] [info] # Reads: 701638
[2017-07-04 12:42:52.891] [log] [info] Total number of active reads: 11498/701638

The first ~24 hours of runtime (when user CPU time was nearly zero, and sys CPU time was very high) look like there is a lot of memory re-allocation going on; however, this is just a guess, as I haven't traced the program, and also haven't looked into the functions called from this block:

if (strlen(name_las) > 0) {
        la.resetAlignment();
        la.getOverlap(aln, range);
        la.resetAlignment();
        la.getAlignment(full_aln, range);
    }

I wonder if replacing push_back inside a loop with a pre-allocated array would shorten the high-sys-time of the program? Something like this:

    int active_reads = 0;  # counter for allocation below
    for (int i = 0; i < n_read; i++) {
        reads[i]->active = maximal_read[i];
        if (reads[i]->active) active_reads++;
    }

    int *range = malloc(active_reads * sizeof(int));

    int active_index = 0;
    for (int i = 0; i < n_read; i++) {
        if (reads[i]->active) range[active_index] = i;
        active_index++;
    }

    # ....

    free(range)

If you want to add a little bit more logging, or some other means of identifying the bottleneck, or a slightly different code - I'll be able to run updated code on my dataset (but I'm limited to maybe 1 or 2 more processes, can't consume more resources).

spock commented 6 years ago

The problem could be with my PacBio data, because when I run Hinge only on the nanopore part of the dataset, I do not see any problems. I do not yet know what exactly is "wrong" with PacBio dataset.

govinda-kamath commented 6 years ago

@fxia22 found a bug here wrt the file i/o, which is slowing things down. He will fix it by the middle of next week.

spock commented 6 years ago

Thanks! I had to stop my hybrid assembly trials for now, but I can try again later :)

fxia22 commented 6 years ago

Hi @spock, we have resolved the issue of draft assembly, you are welcome to have a try :D

spock commented 6 years ago

Thank you, @fxia22 !

I had to finish this project in a different way and move on to another project, but I'll give Hinge another try when I get more PB/ON data.

HingeAssembler / HINGE

`hinge draft` (draft_assembly): --mlas, memory use, and system CPU time #121