Open dlobba opened 7 years ago
I have analyzed this situation more carefully.
E l12 1+ 2+ 0 3 0 3 *
It's not a dovetail overlap situation, but it's a partial (general) overlap correctly described
by a generic edge line. The GFA1 specification couldn't describe this situation.
So, I've read again the specification and I have changed the assumption for a dovetail overlap in GFA2 edges. The statement:" For example a GFA edge which encodes what is called a dovetail overlap (because two ends overlap) is a GFA2 edge where either beg1 = 0 or end1 = x$ and either beg2 = 0 or end2 = y$."
Is incorrect to me, since could lead to the situation described in this issue post.
The interpretation I have used is: "An Edge encodes a dovetail overlap if beg1 = 0 and end2 = y$ (so the end of the second segment overlaps with the begin of the first one) or end1 = x$ and beg2 = 0 (this is the normal Link case)." So, after this preamble, I have to check which of the two segments is the "from_segment" (regarding the dovetail overlap) and apply the sign rules as Link describes.
The code for this is the following (after checking that the edge is a dovetail overlap):
def _set_segments_end_link(self):
if self.from_orn == "+":
self._from_segment_end = "R"
else:
self._from_segment_end = "L"
if self.to_orn == "+":
self._to_segment_end = "L"
else:
self._to_segment_end = "R"
def _set_segments_end_edge(self):
beg1, end1 = self.from_positions
beg2, end2 = self.to_positions
# the dovetail is from the end of from_node to
# the beginning of to_node, just like a GFA1 Link
if beg2 == "0":
self._set_segments_end_link()
else: # dovetail between end of to_node and begin of from_node
if self.from_orn == "+":
self._from_segment_end = "L"
else:
self._from_segment_end = "R"
if self.to_orn == "+":
self._to_segment_end = "R"
else:
self._to_segment_end = "L"
The recognition of dovetail overlaps in GFA2 edges has been done has follow, considering the positions of the two sequences (beg1, end1, beg2 and end2 respectively) the edge represents a dovetail overlap if: beg1=0 or end1=x$ and beg2=0 or end2=y$. If this is true then extremes taken into account are computed as the one described in GFA1, if segment1 is + then from_segment_end is R otherwise is L. If segment2 is + then to_segment_end is L otherwise R.
This is not correct in GFA2 since a dovetail overlap could be, erroneously, such this one:
E l12 1+ 2+ 0 3 0 3 *
So this situations must be solved in a different way then the one used by GFA1.