bcgsc / ntEdit

✏️ Genome assembly polishing & SNV detection
GNU General Public License v3.0
64 stars 9 forks source link

ntEdit std::bad_alloc #14

Closed cbirbes closed 4 years ago

cbirbes commented 4 years ago

Hi, I'm trying to polish a Bovine genome with ntEdit and I have a problem with the software.

i did a ntHit run (300GB memory, 20CPU):

nthits -t 20 data/LR/trio1.mother.run1.fastq.gz data/LR/trio1.mother.run2.fastq.gz data/LR/trio1.mother.run3.fastq.gz

then i tried to run ntEdit (350GB, 20CPU):

ntedit -t 20 -f data/draft/mother_raw.cgt.fa -r repeat_k64.rep -b ntEditTest

the log files are there : For ntHit Reapeat profile estimated using ntCard in (sec): 1641.0705 Errors k-mer coverage: 24 Median k-mer coverage: 24 Repeat k-mer coverage: 42 Approximate# of distinct k-mers: 147250307297 Approximate# of solid k-mers: 11862647 Total time for computing repeat content in (sec): 12162.3971

For ntEdit ---------- running ntedit : Mon Feb 3 15:35:23 2020 ---------- loading Bloom filter from file : Mon Feb 3 15:35:23 2020

ACATCACC size(bits): 4702674184463206215 hash: 1095193667 k: 1129595201 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /var/spool/slurm/d/job10340313/slurm_script : ligne 9 : 190002 Abandon ntedit -t 20 -f data/draft/mother_raw.cgt.fa -r repeat_k64.rep -b ntEditTest

i don't really understand what the std::bad_alloc mean, is that an equivalent of out of memory event ? what can i change to run ntEdit ?

Thanks

warrenlr commented 4 years ago

You are using nthits improperly. You need to ask nthits to output a Bloom filter and get the solid/robust kmer slice, not the repeat kmer slice.

Here's how you should invoke the programs:

ntHits:

nthits -b 36 --outbloom --solid -p solid -k 50 -t 20 @reads.in
nthits -b 36 --outbloom --solid -p solid -k 45 -t 20 @reads.in
nthits -b 36 --outbloom --solid -p solid -k 40 -t 20 @reads.in

notes: (1) Place the path to your short reads in "reads.in" (2) For a genome of that size, I would not go higher than k50 and not below k40. (3) I strongly recommend that you plot the kmer coverage histogram with ntcard first, because the stats you posted are not looking too good (error kmer at threshold of 24 is very high). From a coverage distribution histogram you can determine where the error kmers are and use that as a threshold for nthits (-c XX instead of --solid)

ntEdit:

ntedit -t 20 -f data/draft/mother_raw.cgt.fa -r solid_k50.bf -b ntEditTest

Please consult the ntEdit readme for an explanation of what the parameters are and how to run iteratively.

IgnasiLucas commented 1 year ago

Hello, I am experiencing the same std::bad_alloc error message. I have a 453 Mb draft assembly, and a pair of R1 and R2 FASTQ files for polishing. I run nthits (version 1.0.1) on either both or only one of the FASTQ files, with k = 31, 47 or 61. I tried the --solid option, and also setting the --cmin parameter at about the coverage where the bump begins in the k-mer profile (that is, "--cmin 6" for k = 31, for example). I show below the nthits command, when using a FASTA version of forward reads that had been filtered to minimize errors:

nthits --frequencies filtered_k31.hist --out-file filtered_k31.bf --min-count 6 --kmer-length 31 --threads 16 -vv bf filtered.fasta

And the ntedit (version 1.3.5) command:

ntedit -t 16 -f $ASSEMBLY -r filtered_k31.bf -b edited_k31

Below I show the outputs from nthits and ntedit. I would appreciate some guidance. Thank you.

Input files:

InitialiDONE (13.1285s) Processing DONE (3460.85s)

Distinct k-mers BF stats:

Intermediate CBF stats:

Output BF stats:

Saving ouDONE (6.81283s)


---------- running ntedit : Sun Apr 2 13:38:39 2023 ---------- loading Bloom filter from file : Sun Apr 2 13:38:39 2023 [BTLKmer size(bits): 6877671131657619053 hash: 173880950 k: 1025534729 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

warrenlr commented 1 year ago

Thank you for the detailed report and interest in nthits+ntedit, @IgnasiLucas. Tagging @lcoombe and @parham-k

This error is likely caused by a version incompatibility between ntEdit and ntHits; Please note that all released versions of ntEdit are NOT cross compatible with the new ntHits release (v1.0.1).

We recommend you re-install ntEdit (+all its dependencies, including nthits) using conda:

conda install -c bioconda ntedit

parham-k commented 1 year ago

This error is likely caused by a version incompatibility between ntEdit and ntHits

Exactly. ntEdit is only compatible with ntHits v0.0.1. We'll work on a fix.