AlgoLab / pygfa

A library to manage GFA files
http://pygfa.readthedocs.io
MIT License
7 stars 3 forks source link

Invalid dovetail overlaps orientation in GFA2 Edges #2

Open dlobba opened 7 years ago

dlobba commented 7 years ago

The recognition of dovetail overlaps in GFA2 edges has been done has follow, considering the positions of the two sequences (beg1, end1, beg2 and end2 respectively) the edge represents a dovetail overlap if: beg1=0 or end1=x$ and beg2=0 or end2=y$. If this is true then extremes taken into account are computed as the one described in GFA1, if segment1 is + then from_segment_end is R otherwise is L. If segment2 is + then to_segment_end is L otherwise R.

This is not correct in GFA2 since a dovetail overlap could be, erroneously, such this one: E l12 1+ 2+ 0 3 0 3 * So this situations must be solved in a different way then the one used by GFA1.

dlobba commented 7 years ago

I have analyzed this situation more carefully. E l12 1+ 2+ 0 3 0 3 * It's not a dovetail overlap situation, but it's a partial (general) overlap correctly described by a generic edge line. The GFA1 specification couldn't describe this situation.

So, I've read again the specification and I have changed the assumption for a dovetail overlap in GFA2 edges. The statement:" For example a GFA edge which encodes what is called a dovetail overlap (because two ends overlap) is a GFA2 edge where either beg1 = 0 or end1 = x$ and either beg2 = 0 or end2 = y$."

Is incorrect to me, since could lead to the situation described in this issue post.

The interpretation I have used is: "An Edge encodes a dovetail overlap if beg1 = 0 and end2 = y$ (so the end of the second segment overlaps with the begin of the first one) or end1 = x$ and beg2 = 0 (this is the normal Link case)." So, after this preamble, I have to check which of the two segments is the "from_segment" (regarding the dovetail overlap) and apply the sign rules as Link describes.

The code for this is the following (after checking that the edge is a dovetail overlap):

    def _set_segments_end_link(self):
        if self.from_orn == "+":
            self._from_segment_end = "R"
        else:
            self._from_segment_end = "L"
        if self.to_orn == "+":
            self._to_segment_end = "L"
        else:
            self._to_segment_end = "R"

    def _set_segments_end_edge(self):
        beg1, end1 = self.from_positions
        beg2, end2 = self.to_positions

        # the dovetail is from the end of from_node to
        # the beginning of to_node, just like a GFA1 Link
        if beg2 == "0":
            self._set_segments_end_link()
        else: # dovetail between end of to_node and begin of from_node
            if self.from_orn == "+":
                self._from_segment_end = "L"
            else:
                self._from_segment_end = "R"
            if self.to_orn == "+":
                self._to_segment_end = "R"
            else:
                self._to_segment_end = "L"