Closed koushik1989 closed 5 years ago
First, I just want to mention that if you leave the reference (or query) as scaffolded sequence with Ns, you will get Assemblytics variant calls anytime the number of Ns doesn't exactly match for alignments that bridge across the gap, so just be aware of that when analyzing your results. I would advise breaking up the scaffolds into pieces, but since you likely want to keep the coordinates, it's probably easier to use it as-is and just filter out the variant calls that overlap the N-filled sequences.
The unique sequence length can be left as the default (10kbp). The best reason for lowering this would be if your assembly has small contigs, which it might with Illumina sequencing alone, so I would try it with 1kbp unique anchor length. Also check out the Assemblytics paper, especially the supplement, which demonstrates what the unique anchor length actually means: http://www.ncbi.nlm.nih.gov/pubmed/27318204. (The preprint is also free on BioRxiv: https://www.biorxiv.org/content/10.1101/044925v1)
Hi, I am trying to call variants from my 5 de novo assemblies sequenced using Illumina against a reference assembly sequenced using nanopore. I am aligning the contigs from these assemblies against the reference which is scaffold level with Ns. These assemblies are of Drosophila spp. I am confused with setting the Unique sequence length required. In one of the thread, you have pointed out that it is a judgment call, but still, I am unable to. Your inputs will help me a lot.