data61 / gossamer

Gossamer bioinformatics suite
Other
19 stars 18 forks source link

building index fails #18

Open splaisan opened 6 years ago

splaisan commented 6 years ago

I want to build for the latest assemblies from human and mouse .dna.primary_assembly.fa I tried up to 400GB and it kept dying with:

"caught unexpected exception: std::bad_alloc"

Did someone succeed building an index from the full references or should we build from the chromosome only subset?

Thanks

PS: this ressembles old STAR building issues ...

splaisan commented 6 years ago

the last crash occurred with chromosomes only fasta files, 256GB RAM and 24 threads.

mouseref=../reference/Mus_musculus.GRCm38.chromosomes.fa
humref=../reference/Homo_sapiens.GRCh38.chromosomes.fa
kmers=25
nthr=24
ram=256
xenome index -v \
   -M ${ram} \
   -T ${nthr} \
   --tmp-dir . \
   -l ./index_creation_log.txt \
   -P idx \
   -H ${mouseref} \
   -G ${humref} \
   -K ${kmers}
caught unexpected exception: std::bad_alloc

The crash occurred in the hash sorting stage after creating all hashes, probably the final stage of the indexing...

last lines recorded are
Fri Mar  2 17:13:30 2018        info    processed 2885681152 individual k-mers.
Fri Mar  2 17:13:30 2018        info    hash table load is 0.18819761061784254
Fri Mar  2 17:13:30 2018        info    number of spills is 0
Fri Mar  2 17:13:30 2018        info    the average k-mer frequency is 1.2281644105528797
Fri Mar  2 17:13:47 2018        info    writing out graph (no merging necessary).
Fri Mar  2 17:13:47 2018        info    sorting the hashtable...

Is this a bad memory management of some kind because I gave too much RAM (32bit linked) I am now running 8 threads and 48GB like in the doc. I will post the results if any progress

Deguerre commented 6 years ago

(32bit linked)

Assuming that this means what I think this means, that's your problem right there. Compiling Xenome as a 32-bit program is unsupported. I'm a bit surprised that it even compiles. In 32-bit mode you are limited to 4GB of address space in total, and that's assuming you have a specially configured operating system (it's more like 3GB in practice). But even putting that aside, Xenome relies on 64-bit arithmetic and bit manipulation instructions.

splaisan commented 6 years ago

Hi. No i was not clear. I did not build it in 32 bits. During previous failed runs with 400gb ram -M it used it almost up and ended with same error. It is clearly the late single thread step doing the sorting which dies. This reproduced wit 400gb + 24 threads and 48gb + 8 threads. Does not seem to relate to the limits but more to an issue during sorting. I did this on ubuntu16 with apt installed deps. I will now try build on another rhel7 server with cmake3 and boost 1.66, the later installed from source.

splaisan commented 6 years ago

It finally worked with 96GB and 8 threads on the ubuntu machine and using the chromosome only references. Do not ask me why! I will relaunch using the full reference to see if the same limits work out.

splaisan commented 6 years ago

It failed with the full assembly fasta files for human and mouse, 98GB RAM and 8 threads. Could one of the developers please build the index using human and mouse GRCh38 assembly fasta files to rule out that this will be the case for all users.

wget ftp://ftp.ensembl.org/pub/release-91/fasta/mus_musculus/dna/\
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/\
Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

When successful, could you then pease your full command. I really need to apply xenome to our data and have been trying for three days already to find a way.

PS classify also hangs in my case

my only successful index is as follows

-rw-r--r--  1 u0002316 domain users   24 Mar  2 23:28 idx-both.header
-rw-r--r--  1 u0002316 domain users 162M Mar  2 23:28 idx-both.kmers-d0
-rw-r--r--  1 u0002316 domain users 277M Mar  2 23:28 idx-both.kmers-d1
-rw-r--r--  1 u0002316 domain users   64 Mar  2 23:28 idx-both.kmers.header
-rw-r--r--  1 u0002316 domain users 1.1G Mar  2 23:28 idx-both.kmers.high-bits
-rw-r--r--  1 u0002316 domain users 8.4G Mar  2 23:28 idx-both.kmers.low-bits.lwr
-rw-r--r--  1 u0002316 domain users 4.2G Mar  2 23:28 idx-both.kmers.low-bits.upr
-rw-r--r--  1 u0002316 domain users 534M Mar  3 07:53 idx-both.lhs-bits
-rw-r--r--  1 u0002316 domain users 534M Mar  3 07:53 idx-both.rhs-bits
-rw-r--r--  1 u0002316 domain users   24 Mar  2 20:37 idx-graft.header
-rw-r--r--  1 u0002316 domain users  89M Mar  2 20:37 idx-graft.kmers-d0
-rw-r--r--  1 u0002316 domain users 135M Mar  2 20:37 idx-graft.kmers-d1
-rw-r--r--  1 u0002316 domain users   64 Mar  2 20:37 idx-graft.kmers.header
-rw-r--r--  1 u0002316 domain users 540M Mar  2 20:37 idx-graft.kmers.high-bits
-rw-r--r--  1 u0002316 domain users 4.5G Mar  2 20:37 idx-graft.kmers.low-bits.lwr
-rw-r--r--  1 u0002316 domain users 2.3G Mar  2 20:37 idx-graft.kmers.low-bits.upr
-rw-r--r--  1 u0002316 domain users   24 Mar  2 22:33 idx-host.header
-rw-r--r--  1 u0002316 domain users  78M Mar  2 22:33 idx-host.kmers-d0
-rw-r--r--  1 u0002316 domain users 138M Mar  2 22:33 idx-host.kmers-d1
-rw-r--r--  1 u0002316 domain users   64 Mar  2 22:33 idx-host.kmers.header
-rw-r--r--  1 u0002316 domain users 508M Mar  2 22:33 idx-host.kmers.high-bits
-rw-r--r--  1 u0002316 domain users 4.0G Mar  2 22:33 idx-host.kmers.low-bits.lwr
-rw-r--r--  1 u0002316 domain users 2.0G Mar  2 22:33 idx-host.kmers.low-bits.upr
-rw-r--r--  1 u0002316 domain users  47K Mar  3 07:53 index_creation_log.txt

# tail of the log for this unique good output, starting where other runs died with 'alloc error' which is not even a long process 5min with these mem and cpu conditions (96:8)
Fri Mar  2 21:43:36 2018        info    sorting the hashtable...
Fri Mar  2 21:47:45 2018        info    sorting done.
Fri Mar  2 21:47:45 2018        info    writing out naked edges.
Fri Mar  2 21:49:19 2018        info    wrote 225786255 pairs.
Fri Mar  2 21:51:25 2018        info    done.
Fri Mar  2 21:51:25 2018        info    merging temporary graphs
Fri Mar  2 22:33:21 2018        info    finish graph build
Fri Mar  2 22:33:21 2018        info    total build time: 6961.2070691585541
Fri Mar  2 22:33:21 2018        info    merging host and graft reference kmer sets
Fri Mar  2 22:33:21 2018        info    counting kmers.
Fri Mar  2 22:53:51 2018        info    writing out 4478137487 kmers.
Fri Mar  2 22:53:51 2018        info    of which 12257675 are common.
Fri Mar  2 23:28:29 2018        info    total elapsed time: 3308.5130729675293
Fri Mar  2 23:28:29 2018        info    computing marginal kmers
Fri Mar  2 23:28:31 2018        info    initialising bitsets
Fri Mar  2 23:31:42 2018        info    calculating grey set
Sat Mar  3 07:50:56 2018        info    found 35112487 gray bits (out of 4478137487).
Sat Mar  3 07:53:30 2018        info    total elapsed time: 30301.084784030914
Sat Mar  3 07:53:31 2018        info    total elapsed time: 48261.735320091248

thanks in advance for your help

splaisan commented 6 years ago

I could build the primary-assemblies index on a separate server running RHEL7 after A full day using 8threads and 96GB RAM. Not sure the OS is involved but surely the more threads and ram and the more chance to crash.