Job killed even though I have enough RAM

ViriatoII commented 3 years ago

Hi,

I've got one of those ugly errors (v1.8.7). My job is killed even though I have 300G of RAM. The max RAM usage was 90G. Anything I can do? Perhaps recompile?

=>Reading long reads, building hash table : Wed Dec  9 09:59:02 CET 2020
Reads processed k=20, dist=4000, offset=0 nt, sliding step=2 nt:

>Reads processed from file 1/1, MY_DIR/d_tenuifolia/reads.fa:
183075(base)

Writing a 1004578712 byte filter to d_tenuifolia_scaf.bloom on disk.
/var/spool/pbs/mom_priv/jobs/7048737.hpc-batch14.SC: line 24: 118682 Killed        /gpfs/project/projects/qggp/src/LINKS/bin/LINKS -f assembly-renamed.fa -s reads.fof -k 20 -b ${species}_scaf -l 5 -t 2 -a 0.3

Cheers, Ricardo

warrenlr commented 3 years ago

yeah, that's a pickle..

not too sure as I've never seen this error at the Bloom filter building stage.

rebuilding swig/BTL Bloom a possibility, and on the same server where you intend to run LINKS.

Also, in the tools folder, there's a utility called writeBloom.pl that you can run to first build the Bloom filter from your draft assembly "assembly-renamed.fa", which you can then pass to LINKS with -r

ViriatoII commented 3 years ago

Hi @warrenlr,

I tried your suggestion of using writeBloom.pl separately. That part works perfectly. But when feeding it to Links, the same problem happened:

A Bloom filter was supplied (d_tenuifolia_scaf.bloom) and will be used instead of building a new one from -f assembly-renamed.fa
Checking Bloom filter file d_tenuifolia_scaf.bloom...ok
Loading bloom filter of size 8036629696 from d_tenuifolia_scaf.bloom

=>Reading long reads, building hash table : Thu Dec 10 12:13:13 CET 2020
Reads processed k=20, dist=4000, offset=0 nt, sliding step=2 nt:

Reads processed from file 1/1, /gpfs/project/projects/qggp/C34_PS/experiments/scaffolding/links/d_tenuifolia/reads.fa:
253146
/var/spool/pbs/mom_priv/jobs/7055749.hpc-batch14.SC: line 30: 110149 Killed   /gpfs/project/projects/qggp/src/LINKS/bin/LINKS -f assembly-renamed.fa -s reads.fof -k 20 -b ${species}_scaf -l 5 -t 2 -a 0.3 -r d_tenuifolia_scaf.bloom

I will try rebuilding the swig/BTL Bloom. Is tinkering with the -p parameter a good idea?

-p Bloom filter false positive rate (default -p 0.001, optional; increase to prevent memory allocation errors)

Cheers, Ricardo

warrenlr commented 3 years ago

Seems to crash pretty early on, which is not encouraging. I really doubt it is a Bloom filter issue, unless it was already occupying so much of the memory, which I doubt (how big is the Bloom filter file?).

param -t controls how many kmer pairs are extracted from the sequences (and thus has a big impact on memory usage). Could you please run with first an insanely large value, say -t 1000. I doubt you would get much scaffolding at -t 1000 but this important troubleshooting will tell us whether or not the issue we are seeing is related to a bloated kmer pair hash table (& want to confirm LINKS will run to completion on your data). If it works, let's decrease by a factor ten (-t 100) and re-run. keep decreasing 10X until it breaks.

What data type is in files "reads.fa" listed in reads.fof ? nanopore reads? what approximate coverage? answers to these questions will help us understand your data and possible limitations/ways to improve the run

lcoombe commented 3 years ago

Just to add to Rene's suggestions - if you haven't already, it would be worth running one of the tests supplied in the repo just as a sanity check to be extra sure that your LINKS installation itself is working properly.

ViriatoII commented 3 years ago

Hey. So answering your questions:

It's Nanopore data, 10x coverage but further error corrected and trimmed by canu (so perhaps a bit less than 10x)
Running on -t 100 works :) I does some scaffolding. I'm trying on -t 10 now.
As for the tests suggested by @lcoombe, I unfortunately cannot run them, because my cluster does not have a connection to the internet anymore (the scrips download data first; security measures due to a hacker attack). But I think LINKs works well, I've also used it a lot in the arcs pipeline.

lcoombe commented 3 years ago

Ok! Yeah the difference there is the ARCS doesn't use a Bloom filter at all, or do any of the initial steps of the LINKS pipeline (it only uses the scaffold layout part). But it answers the question about the installation if that -t 100 works! Sounds like it was most likely a memory issue then with your previous runs.

warrenlr commented 3 years ago

Great to hear it works in your hands! & since these are error-corrected ONT reads, it may be safe to increase k slightly (may lead to increased specificity). I also recommend running LINKS iteratively, from short (eg -d 1000) to longer intervals between paired kmers, feeding the output of each run into the next -d run. I suggest -d 1000, 2500, 5000, 7500, 10000, 15000, 20000 (but you could explore any distances). For each, you could decrease -t to make sure you squeeze in as many kmer pairs as possible that fits into memory.

Thank you for using LINKS @ViriatoII

safe to close this ticket.

ViriatoII commented 3 years ago

Thank you! By the way, I continued experimenting with -t parameters the minimum that worked so far was 20 (10 crashed).

Have a great day!

warrenlr commented 3 years ago

excellent! & -t 20 was with -d 4000 (default). You'd be able to decrease -t with higher -d and would likely need to increase it while exploring distances shorter than 4000 bp.

All the best, Rene

bcgsc / LINKS

Job killed even though I have enough RAM #55