Rathcke / klust

[DR]NA clustering using k-mers
1 stars 0 forks source link

Fubar max mem reported #5

Open maasha opened 9 years ago

maasha commented 9 years ago
./klust ~/scratch/GG_BP.fna --sort_incr -u ~/scratch/clusters
Running with parameters:
  k = 5
  id = 0.85
  max_rejects = 8
  depth = 0

Reading sequences...
Time: 71.9542 sec.
Seqs/sec: 17552.6

Sorting by increasing sequence length...
Clustering 1262986 sequences...
100%
Time: 84.8908 sec.
Throughput: 14877.8 seqs/sec.

Clusters:   5754
Max size:   157402
Avg size:   219.497
Min size:   1
Singletons: 2136
Max mem:    3115372 MB

Particulars:

gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix
ahovgaard commented 9 years ago

I am not able to reproduce this on Linux, but it seems that the ru_maxrss field in the rusage struct might be in bytes on OS X, as opposed to kilobytes on Linux. Seems plausible.

struct rusage usage;
if(getrusage(RUSAGE_SELF, &usage) == 0)
    cout << "Max mem:    " << usage.ru_maxrss / 1024 << " MB" << endl;
ahovgaard commented 9 years ago

Another thing: Is reading sequences always that slow on your machine? Here I get around 40k sequences (SILVA) per second and that is using an encrypted hard drive (not SSD).

ahovgaard commented 9 years ago

I just pushed something which should hopefully fix this. Will you pull or clone again and test it?

maasha commented 9 years ago

So the memory reporting is better but unstable - here I ran the same command twice and first time I get Max mem: 2952 MB and second time: Max mem: 3358 MB?

With respect to reading speed. I got an unencrypted flash drive on a brand new MacBook Pro :o(.

maasha@edna:~/scratch$ klust GG_BP.fna --sort_incr -u clusters
Running with parameters:
  k = 5
  id = 0.85
  max_rejects = 8
  depth = 0

Reading sequences...
Time: 72.3317 sec.
Seqs/sec: 17461

Sorting by increasing sequence length...
Clustering 1262986 sequences...
100%
Time: 86.4049 sec.
Throughput: 14617.1 seqs/sec.

Clusters:   5754
Max size:   157402
Avg size:   219.497
Min size:   1
Singletons: 2136
Max mem:    2952 MB
maasha@edna:~/scratch$ klust GG_BP.fna --sort_incr -u clusters
Running with parameters:
  k = 5
  id = 0.85
  max_rejects = 8
  depth = 0

Reading sequences...
Time: 65.8981 sec.
Seqs/sec: 19165.7

Sorting by increasing sequence length...
Clustering 1262986 sequences...
100%
Time: 85.8016 sec.
Throughput: 14719.8 seqs/sec.

Clusters:   5754
Max size:   157402
Avg size:   219.497
Min size:   1
Singletons: 2136
Max mem:    3358 MB