Gabaldonlab / redundans

Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
GNU General Public License v3.0
129 stars 20 forks source link

Scaffolding with long reads fails #73

Closed hkrzystek closed 4 years ago

hkrzystek commented 5 years ago

Hello,

In my running Redundans, the last files created are the scaffolds.reduced files, and then Redundans stops. No scaffolds.filled files are created. Here is the output file:

Options: Namespace(fasta='/pine/scr/h/a/halina/carrolli_assembly/combined_assembly/miniasm_results/medaka/consensus.fasta', fastq=[], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7fab08c751e0>, longreads=['data/d_carrolli_combined.fastq'], mapq=10, mem=16, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='carrolli_4', overlap=0.8, reference='', resume=False, threads=4, tmp='/tmp', usebwa=False, verbose=True)

##################################################
[Mon Sep 30 13:35:34 2019] Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs      [%]
carrolli_4/contigs.fa   361528537       1431    109262537       30.22   1104    77.15   94.016  0       252266000       69.78   327     22.85

##################################################
[Tue Oct  1 00:04:57 2019] Scaffolding with long reads...
 iteration 1...
last-split: split/cbrc_split_aligner.cc:653: void cbrc::SplitAligner::calcBaseScores(unsigned int): Assertion `q >= 0' failed.

##################################################
[Tue Oct  1 09:01:42 2019] Final reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs      [%]
carrolli_4/scaffolds.longreads.fa       207058847       215     60730207        29.33   103     47.91   90.416  0       146328640       70.67   112     52.09

##################################################
[Tue Oct  1 10:47:37 2019] Reporting statistics...
#fname  contigs bases   GC [%]  contigs >1kb    bases in contigs >1kb   N50     N90     Ns      longest
/pine/scr/h/a/halina/carrolli_assembly/combined_assembly/miniasm_results/medaka/consensus.fasta 1431    361528537       40.002  1198    361407290       823695  192134  0       18082807
carrolli_4/contigs.fa   1431    361528537       40.002  1198    361407290       823695  192134  0       18082807
carrolli_4/contigs.reduced.fa   327     252266000       40.560  312     252259016       1500813 367713  0       18082807
carrolli_4/scaffolds.longreads.1.fa     215     207058847       40.673  200     207051863       3007667 477918  1926218 16858222
carrolli_4/scaffolds.longreads.fa       215     207058847       40.673  200     207051863       3007667 477918  1926218 16858222
carrolli_4/scaffolds.reduced.fa 112     146328640       40.810  103     146324602       5377666 620950  1266972 16858222

##################################################
[Tue Oct  1 10:48:41 2019] Cleaning-up...
#Time elapsed: 21:14:44.925628
cat: write error: Broken pipe
slurm-35128947.out (END)

To me it looks like the scaffolding with long reads is failing, but I don't understand why or how to fix this issue. The broken pipe at the end may also be a problem... The above has occurred on several runs where I run with 4 threads and 100Gb of memory. To this point it takes about 24 hours to run.

Thank you for your help!

bifxcore commented 4 years ago

According to your output and my own experience, scaffolding with long reads succeeds, but redundans stops before the gap closing step. I wish the authors would respond... but I guess there's too few of us with PacBio data?

See also https://github.com/lpryszcz/redundans/issues/56

lpryszcz commented 4 years ago

Hi @bifxcore and @hkrzystek , redundans currently only supports gap closing with short reads (pair-end libraries are required). Frankly I don't know any software that can close gaps with long reads... Any ideas? If so, please open another issue with enhancement idea :)