Closed cahuparo closed 6 months ago
Dear @cahuparo,
Thank you for reaching out. In general, Ratatosk was designed with human genomes datasets in mind (so heterozygous but not highly heterozygous). However, on top of my head, highly heterozygous genomes should not affect the performance of Ratatosk and Ratatosk should work well straight out of the box on highly heterozygous genomes, it is just that there is no special settings or tweaks for your type of input genome. Here is for your questions:
-Q 90
in the command line (because the default is -Q 40
). Also, Ratatosk works best with paired-end short reads in input (-s
): input short reads from the same pair must have EXACTLY the same FASTA/FASTQ name (if the reads are extracted from a BAM file, use samtools bam2fq -n
). Finally, make sure you use the latest version of Ratatosk on this GitHub (0.9.0) and not the conda version which seems to be broken.Let me know if any of this is unclear or if I can be of further assistance, Guillaume
Hi @GuillaumeHolley,
Thank you for the comprehensive response and the valuable insights into Ratatosk's capabilities and design philosophy, especially regarding haplotype preservation in the context of highly heterozygous genomes. Your explanation and the reference to the Ratatosk paper provide a solid foundation for understanding how Ratatosk could benefit our genome assembly project. Before proceeding, I have a few additional questions to ensure optimal application and results:
Dual Assembly Pipeline Integration: Could you provide more details or examples on how Ratatosk has been integrated into diploid dual assembly pipelines, particularly regarding workflow steps before and after Ratatosk's application? Any specific considerations or adjustments needed for such integration would be highly valuable.
Handling of Structural Variations: In genomes with high heterozygosity, structural variations (SVs) can be as important as SNPs. How does Ratatosk handle or affect the correction of reads containing structural variations? Are there any strategies within Ratatosk to preserve SVs?
Future Updates: Are there any planned updates or features in Ratatosk that could further enhance its suitability for highly heterozygous non-human genomes? Insight into ongoing developments would be helpful for long-term planning.
Thank you again for your time and assistance.
Best,
Camilo
Thank you for taking the time to answer my questions. I will give it try! Best, Camilo
Hi @GuillaumeHolley,
I am currently working on assembling the genome of a highly heterozygous diploid organism, using a combination of Oxford Nanopore Technologies (ONT - R10) and Illumina sequencing data. Given the significant level of heterozygosity present in my organism, I aim to use Ratatosk for correcting ONT reads with the high-accuracy Illumina reads before assembly. My primary concern revolves around the tool's ability to distinguish between sequencing errors and true haplotype variations.
Specific Questions:
My goal is to ensure the highest possible quality and accuracy in our genome assembly, particularly in maintaining the true genetic diversity represented by the distinct haplotypes. Understanding Ratatosk's capabilities and limitations in this context will greatly aid in planning our assembly workflow and optimizing our use of the tool for our specific needs.
Thank you for your assistance and for developing such a valuable resource for the genomics community. I look forward to your insights and recommendations.
Best,
Camilo