MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

choosing unique_anchor_length #4

Closed BenjaminSchwessinger closed 5 years ago

BenjaminSchwessinger commented 7 years ago

Thanks Maria for this very useful piece of software. I am trying to use Assemblytics to compare the variation of primary contigs and haplotigs I got out from a FALCON unzip assembly of a dikaryotic fungus. For this I only map the haplotigs to their corresponding primary contigs.

In Assemblytics the size of the SV is linked to the unique_anchor_length. I have been playing around a bit and compared 10kbp to 50kbp. The later detected less variants but covered more sequence space . Of course small haplotigs (< 50kbp) will not be included in the later analysis.

Is it possible to combine different unique_anchor_length outputs e.g. add calls for smaller haplotigs using 10kb cutoff to a 50kb run? Can you advise on how to set the 'best' unique_anchor_length'?

http://qb.cshl.edu/assemblytics/analysis.php?code=SUIBTG2OcqVoxxqb6eKK http://qb.cshl.edu/assemblytics/analysis.php?code=vDHlfEIeWnTDLyhn3eVn

MariaNattestad commented 7 years ago

Thanks for using Assemblytics! I made a small change to Assemblytics to allow you to set the maximum variant size and unique anchor length separately. The 'best' unique anchor length really depends on your genome and is a judgment call based on how safe you want to be that you are excluding repeats balanced with how many of the smaller contigs are you comfortable throwing away. 10 kb is a pretty solid choice for mammalian-size genomes, and I have only set it down to 1kb for bacteria and sometimes for yeast. I have now allowed setting the anchor length and maximum size separately because it sounds reasonable to me to for instance use 10kb anchored sequences to call 50kb variants like you suggested.

BenjaminSchwessinger commented 7 years ago

Thank you Maria. Great you updated the scripts. I saw that the online version has this new feature. Could you also update the Git repertoire. Would make things easier for automatization. Thanks again.

songtaogui commented 5 years ago

@MariaNattestad Hi, Maria. I would like to "set the maximum variant size and unique anchor length separately" using command-line-Assemblytics, is it OK if I just modify the Assemblytics shell script line-50 by:

# raw line 50 :
# MAXIMUM_SIZE=$UNIQUE_LENGTH
MAXIMUM_SIZE=50000 # manually assigned values

Thank you Songtao Gui

MariaNattestad commented 5 years ago

Yes that would do it!