MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

What are ref gap size , query gap size and ref_start and ref_stop for tandem expansion and tandem contractions? #55

Closed mariaelf97 closed 6 months ago

mariaelf97 commented 1 year ago

Hi Maria, Hope you are doing well! I was wondering if you could help me with understanding what ref gap size , query gap size and ref_start and ref_stop mean in case of tandem expansion and contraction? For instance this is an example output : ref_start ref_stop ID size strand type ref_gap_size query_gap_size query_coordinates method 336379 338466 Assemblytics_b_7 2760 + Tandem_contraction 2087 -673 1:339067-339740:- between_alignments I tried to make a visualization of my understanding here

Could you kindly verify my understanding is correct? If not, could you provide a visualization of where the coordinates are?

MariaNattestad commented 6 months ago

I'm sorry for the long delay! In case it's still relevant, I'll answer your question: Everything with the "ref_" prefix is based in reference coordinates. In your diagram, you would swap the ref_gap_size and query_gap_size, so the ref_gap_size which is negative is at the top, and negative gap means there's an overlap.

The size is `ref_gap_size - query_gap_size = 2087- -673 = 2760. This is a total number of base pairs that are affected/differ, as best we can tell without knowing the actual basepair sequence, so in this case it's the gap in the ref plus the overlap in the query.