DecodeGenetics / Ratatosk

Hybrid error correction of long reads using colored de Bruijn graphs
BSD 2-Clause "Simplified" License
96 stars 7 forks source link

questions re index construction & insert size #42

Closed rwhetten closed 2 years ago

rwhetten commented 2 years ago

Thanks for a great tool! I have two questions to try to optimize my use of Ratatosk.

  1. For the insert size, how sensitive is the error-correction process to deviation from the default 500-bp insert size? In other words, if median insert size is 450 bp, is it useful to change that parameter? If inserts are greatly different from 500 bp (e.g. 250 bp), does setting the correct insert size using -i improve the outcome?
  2. I'm working with a large genome (>15 Gb haploid size), and plan to error-correct several lanes of PromethION data. Is it possible to build an index from the short-read data (39x coverage) once and use the same index to correct each lane of nanopore reads? Or is the process of building the index dependent on the long-read data, so that a new index must be made for each long-read dataset processed?
GuillaumeHolley commented 2 years ago

Hi @rwhetten,

I hope this is clear enough :) Guillaume

rwhetten commented 2 years ago

Yes, that is perfect - thank you for the quick reply! I'll close the issue.